feature: support rfc-compliant bracketed ipv6 addresses in URLs.#106
feature: support rfc-compliant bracketed ipv6 addresses in URLs.#106andrewsuperlegit wants to merge 5 commits into
Conversation
Updates URL parsing to support RFC 3986 and RFC 6874 bracketed IPv6 addresses within a valid URL scheme. - replaces the for loop in find_authority_end with a while let loop to handle ipv6 extraction which apparently allows rust to use a jump table - when an open bracket is encountered, standard domain validation is suspended. the parser validates ipv6 valid characters-- including zone identifiers aka percentage signs until the closing bracket is found - maintains near zero performance regression on standard plaintext by ensuring normal characters bypass the ipv6 branch stuff - adds dedicated ipv6 benchmark cases and a substantial number ipv ipv6 unit tests - centralizes some testing utilities into tests/common/mod.rs to prevent duplicating test logic/functionality. - clippy remains undefeated NOTE: this adds ipv6 addresses within valid schemes-- schemeless bracketed IPs are intentionally excluded to prevent performance regressions on standard text.
|
I think i'm going to do a little bit of a refactor to clean up the contribution a little bit. i mean let me know what you think of it overall but i'm going to extract my portion into something thats a little easier to read. |
… that describes what is happening. also move the ipv6 character matchings and zone_id matchings to their own helper functions for ledgibility
6a85bd7 to
af190c7
Compare
|
full disclosure, after the refactor i noticed the following performance regressions: Gnuplot not found, using plotters backend some_links time: [142.94 ns 143.55 ns 144.27 ns] heaps_of_links time: [517.40 ns 521.46 ns 526.98 ns] some_links_without_scheme that said i'm on a laptop so those might not be real performance hits- by all means check; just wanted to call it out. but if they are real its up to you whether you want me to get rid of the refactor. I think it's cleaner and easier to read the way i have it but it's not my codebase. |
There was a problem hiding this comment.
Some comments, but overall looks good. Note that the rustfmt check failed.
I don't think the benchmarks would be affected that much because of an additional case for [ in the loop, so it might just be noise. But I'll run the benchmarks on my side when I have some time.
- readds the comment i accidentally deleted - adds a comment for the continue statment as requested by maintainer - reruns cargo fmt check and applies changes.
@robinst
fix for issue 78
Updates URL parsing to support RFC 3986 and RFC 6874 bracketed IPv6 addresses within a valid URL scheme.
NOTE: this adds ipv6 addresses within valid schemes-- schemeless bracketed IPs are intentionally excluded to prevent performance regressions on standard text.