perf: avoid per-call ctx allocation in next_no_xml_space by mathieu17g · Pull Request #58 · JuliaComputing/XML.jl

mathieu17g · 2026-04-30T09:47:33Z

Summary

next_no_xml_space is the fast path of Raw's document-order traversal — it's selected by next(::Raw) when the underlying byte buffer contains no xml:space attribute anywhere. In that situation the per-Raw ctx (xml:space inheritance stack) field is always Bool[false] and is never mutated, but the function nevertheless allocates a fresh [false] on every call:

function next_no_xml_space(o::Raw)
    ...
    ctx = [false]                                  # ← fresh allocation per call
    ...
    return Raw(type, depth, i, j - i, data, ctx, has_xml_space)
end

For documents with thousands of XML nodes that allocation dominates the cumulative memory profile of any consumer that walks the tree. The change here just reuses the parent's ctx reference instead:

ctx = o.ctx

Why this is safe

next_no_xml_space is selected only when o.has_xml_space === false, in which case ctx descends from the root's Bool[false] (line 73) and is never mutated — the only function that writes to ctx is next_xml_space, on the disjoint has_xml_space === true branch. Inside next_no_xml_space itself, ctx is read once and passed to the Raw(...) constructor; no push! / pop! / setindex!.

A module-level const _DEFAULT_CTX = Bool[false] would be an equally allocation-free alternative if you'd prefer not to rely on the propagation chain — happy to switch.

Why it matters

Measured on a downstream consumer (FastKML.jl extracting a DataFrame from a 47 MiB sample KML, walking ~1 M Raw nodes during the traversal), this allocation site was contributing ~60 MiB of cumulative tracked allocation. Removing it cuts the consumer's end-to-end memory by about a sixth and lops ~7% off wall-clock time. The same pattern would apply to any package using XML.LazyNode or XML.Raw to walk a large document.

(See also #59 for a complementary additive change adding next! / prev! for in-place traversal — independent and stackable.)

Verification

The full XML.jl test suite passes locally on this branch (Julia 1.12), including all 71 xml:space cases — those exercise the next_xml_space path that's deliberately unchanged here.
No public API change, no behavioral change observable to existing callers.

Diff size

One real line changed:

-    ctx = [false]
+    ctx = o.ctx

Plus a short comment explaining the rationale, so future readers don't re-allocate it back without understanding the constraint.

`next_no_xml_space` is only entered when the source document has no `xml:space` attribute anywhere, in which case the per-node `ctx` (xml:space inheritance stack) is always `Bool[false]` and is never mutated. Allocating a fresh `[false]` on every call therefore costs ~32 B × N nodes for no semantic benefit. Reuse the parent's `ctx` instead. On a 47 MiB representative KML file (5,411 Placemarks, ~1 M XML nodes traversed during a `DataFrame` extraction in a downstream consumer), this drops cumulative tracked allocations by ~60 MiB without any behavior or API change. Wall-clock parsing time is unchanged. next_xml_space is unaffected: that path mutates ctx via push/pop when descending into elements with `xml:space`, so each Raw still needs its own copy.

joshday · 2026-04-30T16:45:01Z

FYI there's a bit of a rewrite in progress:

#54

mathieu17g · 2026-04-30T20:58:50Z

Indeed, I had not seen it, sorry.

I will wait for its completion to see if the issue I was addressing in this PR and PR #59 are still there.

mathieu17g mentioned this pull request Apr 30, 2026

feat: add next! and prev! for in-place LazyNode traversal #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: avoid per-call ctx allocation in next_no_xml_space#58

perf: avoid per-call ctx allocation in next_no_xml_space#58
mathieu17g wants to merge 1 commit into
JuliaComputing:mainfrom
mathieu17g:perf-share-default-ctx

mathieu17g commented Apr 30, 2026 •

edited

Loading

Uh oh!

joshday commented Apr 30, 2026

Uh oh!

mathieu17g commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mathieu17g commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this is safe

Why it matters

Verification

Diff size

Uh oh!

joshday commented Apr 30, 2026

Uh oh!

mathieu17g commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mathieu17g commented Apr 30, 2026 •

edited

Loading