Skip to content

perf: avoid per-call ctx allocation in next_no_xml_space#58

Open
mathieu17g wants to merge 1 commit into
JuliaComputing:mainfrom
mathieu17g:perf-share-default-ctx
Open

perf: avoid per-call ctx allocation in next_no_xml_space#58
mathieu17g wants to merge 1 commit into
JuliaComputing:mainfrom
mathieu17g:perf-share-default-ctx

Conversation

@mathieu17g
Copy link
Copy Markdown

@mathieu17g mathieu17g commented Apr 30, 2026

Summary

next_no_xml_space is the fast path of Raw's document-order traversal — it's selected by next(::Raw) when the underlying byte buffer contains no xml:space attribute anywhere. In that situation the per-Raw ctx (xml:space inheritance stack) field is always Bool[false] and is never mutated, but the function nevertheless allocates a fresh [false] on every call:

function next_no_xml_space(o::Raw)
    ...
    ctx = [false]                                  # ← fresh allocation per call
    ...
    return Raw(type, depth, i, j - i, data, ctx, has_xml_space)
end

For documents with thousands of XML nodes that allocation dominates the cumulative memory profile of any consumer that walks the tree. The change here just reuses the parent's ctx reference instead:

ctx = o.ctx

Why this is safe

next_no_xml_space is selected only when o.has_xml_space === false, in which case ctx descends from the root's Bool[false] (line 73) and is never mutated — the only function that writes to ctx is next_xml_space, on the disjoint has_xml_space === true branch. Inside next_no_xml_space itself, ctx is read once and passed to the Raw(...) constructor; no push! / pop! / setindex!.

A module-level const _DEFAULT_CTX = Bool[false] would be an equally allocation-free alternative if you'd prefer not to rely on the propagation chain — happy to switch.

Why it matters

Measured on a downstream consumer (FastKML.jl extracting a DataFrame from a 47 MiB sample KML, walking ~1 M Raw nodes during the traversal), this allocation site was contributing ~60 MiB of cumulative tracked allocation. Removing it cuts the consumer's end-to-end memory by about a sixth and lops ~7% off wall-clock time. The same pattern would apply to any package using XML.LazyNode or XML.Raw to walk a large document.

(See also #59 for a complementary additive change adding next! / prev! for in-place traversal — independent and stackable.)

Verification

  • The full XML.jl test suite passes locally on this branch (Julia 1.12), including all 71 xml:space cases — those exercise the next_xml_space path that's deliberately unchanged here.
  • No public API change, no behavioral change observable to existing callers.

Diff size

One real line changed:

-    ctx = [false]
+    ctx = o.ctx

Plus a short comment explaining the rationale, so future readers don't re-allocate it back without understanding the constraint.

`next_no_xml_space` is only entered when the source document has no
`xml:space` attribute anywhere, in which case the per-node `ctx`
(xml:space inheritance stack) is always `Bool[false]` and is never
mutated. Allocating a fresh `[false]` on every call therefore costs
~32 B × N nodes for no semantic benefit. Reuse the parent's `ctx`
instead.

On a 47 MiB representative KML file (5,411 Placemarks, ~1 M XML nodes
traversed during a `DataFrame` extraction in a downstream consumer),
this drops cumulative tracked allocations by ~60 MiB without any
behavior or API change. Wall-clock parsing time is unchanged.

next_xml_space is unaffected: that path mutates ctx via push/pop when
descending into elements with `xml:space`, so each Raw still needs its
own copy.
@joshday
Copy link
Copy Markdown
Contributor

joshday commented Apr 30, 2026

FYI there's a bit of a rewrite in progress:

#54

@mathieu17g
Copy link
Copy Markdown
Author

Indeed, I had not seen it, sorry.

I will wait for its completion to see if the issue I was addressing in this PR and PR #59 are still there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants