Fix O(N²) duplicate-attribute check in Attributes iterator by qifan-sailboat · Pull Request #971 · tafia/quick-xml

qifan-sailboat · 2026-06-22T22:30:54Z

Fixes #969.

IterState::check_for_duplicates did a linear Vec scan of all already-seen attribute name ranges for every new attribute, so a start tag with N distinct attribute names cost O(N²/2) byte comparisons. On untrusted XML this is a CPU-exhaustion DoS — see #969 for measurements (N=80,000 ≈ 6.1 s release; N=800,000 ≈ 10 min) and the demonstrated downstream impact on NLnet Labs Routinator.

This adds a HashSet<u64> of DefaultHasher hashes of the key bytes as an O(1) pre-filter:

a fresh hash means the key cannot be a duplicate → push and return Ok (the no-duplicate path is now amortised O(1) per attribute / O(N) per start tag);
on a hash hit (a real duplicate, or an astronomically rare 64-bit collision) fall back to the existing linear scan to recover the exact previous position for AttrError::Duplicated(new, prev) — error semantics are unchanged.

The set is lazily allocated (Option<HashSet<u64>>) so IterState::new, Attributes::new and Attributes::html stay const fn.

N (distinct attrs)	before	after
10,000	75 ms	0 ms
20,000	303 ms	1 ms
40,000	1,251 ms	2 ms
80,000	6,109 ms	4 ms

A timing regression test is included in events::attributes::xml::duplicated::with_check.

`IterState::check_for_duplicates` previously did a linear `Vec` scan of all already-seen attribute name ranges for every new attribute, so a start tag with N distinct attribute names cost O(N²/2) byte comparisons. On untrusted XML this is a CPU-exhaustion DoS: N=80,000 took ~6.1 s in release; N=800,000 takes ~10 minutes and cannot be interrupted by an I/O-gated timeout because the loop performs no reads. Add a `HashSet<u64>` of `DefaultHasher` hashes of the key bytes as an O(1) pre-filter. A fresh hash means the key cannot be a duplicate. On a hash hit (a real duplicate, or an astronomically rare 64-bit collision) we fall back to the existing linear scan to recover the exact previous position for `AttrError::Duplicated(new, prev)`, so error semantics are unchanged. The set is lazily allocated so `IterState::new` (and the public `Attributes::new`/`Attributes::html`) stay `const fn`. N=80,000 distinct attributes: 6109 ms → 4 ms. Fixes tafia#969. Co-Authored-By: Claude <noreply@anthropic.com>

dralley · 2026-06-22T22:46:49Z

And the distinct attrs column, I assume that represents N attributes within one element, as opposed to N attributes interspersed across many elements?

qifan-sailboat · 2026-06-22T23:32:46Z

Yes. It means N distinct attributes on a single start tag.

codecov-commenter · 2026-06-23T03:36:42Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 57.37%. Comparing base (e00ae5c) to head (703c47b).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/events/attributes.rs	95.83%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #971      +/-   ##
==========================================
+ Coverage   57.31%   57.37%   +0.06%     
==========================================
  Files          46       46              
  Lines       18197    18207      +10     
==========================================
+ Hits        10429    10447      +18     
+ Misses       7768     7760       -8

Flag	Coverage Δ
unittests	`57.37% <95.83%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dralley · 2026-06-23T03:37:05Z

-                .find(|r| slice[(*r).clone()] == slice[key.clone()])
-            {
-                return Err(AttrError::Duplicated(key.start, prev.start));
+            let mut h = DefaultHasher::new();


SipHash (the Rust default) is fairly expensive in computational terms, though I suppose if the goal is DoS-resistance is the most "safe" option.

Nonetheless would you mind running some benchmarks (e.g. https://github.com/tafia/quick-xml/blob/master/benches/microbenches.rs#L177) to compare how the existing implementation compares to this implementation w/ SipHash and aHash for normal inputs? Or at least the former two.

I would just like an idea of what impact this will have for the standard case.

dralley · 2026-06-23T03:38:09Z

Please run cargo fmt

dralley · 2026-06-23T03:59:02Z

+            /// so a start tag with many distinct attribute names cost O(N²) byte
+            /// comparisons. With the hash pre-filter the same input is O(N).
+            #[test]
+            fn many_distinct_attributes_is_linear() {


I don't love the idea of having something like this in a unit test. I suppose if it's reliable enough, it could be OK. On the other hand maybe it's best to just have a criterion microbenchmark, those tend to get run for any important implementation changes.

@Mingun , thoughts

Yes, I agree, this should be placed in ./benches directory. I suggest to place it into ./benches/issue971.rs. Benchmarks runs as tests on CI, so we will get the same results as for unit test:

quick-xml/.github/workflows/rust.yml

Lines 64 to 65 in 9aaea92

- name: Run tests + benchmarks

run: cargo test --all-features --benches --tests

Mingun

Please move the test into the benchmark directory and could you investigate the possibility to address other comments. You may force-push the changes, because that PR is small enough (and anyway, we prefer to have clean history).

Mingun · 2026-06-23T09:33:29Z

+            /// so a start tag with many distinct attribute names cost O(N²) byte
+            /// comparisons. With the hash pre-filter the same input is O(N).
+            #[test]
+            fn many_distinct_attributes_is_linear() {


Yes, I agree, this should be placed in ./benches directory. I suggest to place it into ./benches/issue971.rs. Benchmarks runs as tests on CI, so we will get the same results as for unit test:

quick-xml/.github/workflows/rust.yml

Lines 64 to 65 in 9aaea92

- name: Run tests + benchmarks

run: cargo test --all-features --benches --tests

Mingun · 2026-06-23T09:36:50Z

+    /// the duplicate check is amortised O(N) over the whole start tag instead of
+    /// O(N²). Lazily allocated on first use so that [`IterState::new`] can stay
+    /// `const`.
+    key_hashes: Option<HashSet<u64>>,


It seems, we should use NoHasher here because we already put the hash to the set.

Mingun · 2026-06-23T09:39:01Z

+  pre-filter so the no-duplicate path is amortised O(1) per attribute; the
+  exact `AttrError::Duplicated(new, prev)` positions are unchanged.
+
+[#969]: https://github.com/tafia/quick-xml/issues/969


We put links at the end of version section (here you put it at the end of Bug Fixes section). Could you please move it below (keep the 2 blank lines before the next ## section)

dralley reviewed Jun 23, 2026

View reviewed changes

Mingun requested changes Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix O(N²) duplicate-attribute check in Attributes iterator#971

Fix O(N²) duplicate-attribute check in Attributes iterator#971
qifan-sailboat wants to merge 1 commit into
tafia:masterfrom
qifan-sailboat:fix/969-attributes-dup-check-quadratic

qifan-sailboat commented Jun 22, 2026

Uh oh!

dralley commented Jun 22, 2026

Uh oh!

qifan-sailboat commented Jun 22, 2026

Uh oh!

codecov-commenter commented Jun 23, 2026

Uh oh!

dralley Jun 23, 2026

Uh oh!

dralley commented Jun 23, 2026

Uh oh!

dralley Jun 23, 2026

Uh oh!

Mingun Jun 23, 2026

Uh oh!

Mingun left a comment

Uh oh!

Mingun Jun 23, 2026

Uh oh!

Mingun Jun 23, 2026

Uh oh!

Mingun Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	- name: Run tests + benchmarks
	run: cargo test --all-features --benches --tests

Conversation

qifan-sailboat commented Jun 22, 2026

Uh oh!

dralley commented Jun 22, 2026

Uh oh!

qifan-sailboat commented Jun 22, 2026

Uh oh!

codecov-commenter commented Jun 23, 2026

Codecov Report

Uh oh!

dralley Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

dralley commented Jun 23, 2026

Uh oh!

dralley Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Mingun Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Mingun left a comment

Choose a reason for hiding this comment

Uh oh!

Mingun Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Mingun Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Mingun Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants