Skip to content

feature: support rfc-compliant bracketed ipv6 addresses in URLs.#106

Open
andrewsuperlegit wants to merge 5 commits into
robinst:mainfrom
andrewsuperlegit:issue-78-support-ipv6
Open

feature: support rfc-compliant bracketed ipv6 addresses in URLs.#106
andrewsuperlegit wants to merge 5 commits into
robinst:mainfrom
andrewsuperlegit:issue-78-support-ipv6

Conversation

@andrewsuperlegit

@andrewsuperlegit andrewsuperlegit commented Jun 6, 2026

Copy link
Copy Markdown

@robinst
fix for issue 78

Updates URL parsing to support RFC 3986 and RFC 6874 bracketed IPv6 addresses within a valid URL scheme.

  • replaces the for loop in find_authority_end with a while let loop to handle ipv6 extraction which apparently allows rust to use a jump table
  • when an open bracket is encountered, standard domain validation is suspended. the parser validates ipv6 valid characters-- including zone identifiers aka percentage signs until the closing bracket is found
  • maintains near zero performance regression on standard plaintext by ensuring normal characters bypass the ipv6 branch stuff
  • adds dedicated ipv6 benchmark cases and a substantial number ipv ipv6 unit tests
  • centralizes some testing utilities into tests/common/mod.rs to prevent duplicating test logic/functionality.
  • clippy remains undefeated

NOTE: this adds ipv6 addresses within valid schemes-- schemeless bracketed IPs are intentionally excluded to prevent performance regressions on standard text.

Updates URL parsing to support RFC 3986 and RFC 6874 bracketed IPv6 addresses within a valid URL scheme.
- replaces the for loop in find_authority_end with a while let loop to handle ipv6 extraction which apparently allows rust to use a jump table
- when an open bracket is encountered, standard domain validation is suspended. the parser validates ipv6 valid characters-- including zone identifiers aka percentage signs until the closing bracket is found
- maintains near zero performance regression on standard plaintext by ensuring normal characters bypass the ipv6 branch stuff
- adds dedicated ipv6 benchmark cases and a substantial number ipv ipv6 unit tests
- centralizes some testing utilities into tests/common/mod.rs to prevent duplicating test logic/functionality.
- clippy remains undefeated

NOTE: this adds ipv6 addresses within valid schemes-- schemeless bracketed IPs are intentionally excluded to prevent performance regressions on standard text.
@andrewsuperlegit

Copy link
Copy Markdown
Author

I think i'm going to do a little bit of a refactor to clean up the contribution a little bit. i mean let me know what you think of it overall but i'm going to extract my portion into something thats a little easier to read.

… that describes what is happening. also move the ipv6 character matchings and zone_id matchings to their own helper functions for ledgibility
@andrewsuperlegit andrewsuperlegit force-pushed the issue-78-support-ipv6 branch from 6a85bd7 to af190c7 Compare June 6, 2026 03:32
@andrewsuperlegit

Copy link
Copy Markdown
Author

full disclosure, after the refactor i noticed the following performance regressions:


Gnuplot not found, using plotters backend
no_links time: [13.742 ns 13.762 ns 13.784 ns]
change: [-2.2038% -1.7054% -1.2243%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

some_links time: [142.94 ns 143.55 ns 144.27 ns]
change: [+12.596% +12.965% +13.367%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe

heaps_of_links time: [517.40 ns 521.46 ns 526.98 ns]
change: [+7.2029% +9.0657% +11.216%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe

some_links_without_scheme
time: [181.52 ns 183.25 ns 185.82 ns]
change: [+5.8928% +6.8380% +7.7802%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe


that said i'm on a laptop so those might not be real performance hits- by all means check; just wanted to call it out. but if they are real its up to you whether you want me to get rid of the refactor. I think it's cleaner and easier to read the way i have it but it's not my codebase.

@robinst robinst left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but overall looks good. Note that the rustfmt check failed.

I don't think the benchmarks would be affected that much because of an additional case for [ in the loop, so it might just be noise. But I'll run the benchmarks on my side when I have some time.

Comment thread src/domains.rs
Comment thread src/domains.rs
- readds the comment i accidentally deleted
- adds a comment for the continue statment as requested by maintainer
- reruns cargo fmt check and applies changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants