Don't redo the parsing when normalizing the url.#163
Don't redo the parsing when normalizing the url.#163bjornkellner wants to merge 1 commit intomasterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the URL normalization process by avoiding redundant parsing when creating normalized URLs. The main change replaces a call to parse with direct object creation in the normalized method.
- Replaces
parsecall with direct object instantiation in thenormalizedmethod to avoid re-parsing - Extracts public suffix domain parsing logic into a separate private method for reuse
- Updates test expectations and RSpec configuration
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| lib/twingly/url.rb | Optimizes normalized method by avoiding re-parsing and extracts public suffix domain logic |
| spec/lib/twingly/url_spec.rb | Updates test expectation for constructor visibility change |
| spec/spec_helper.rb | Adds focus filter configuration for RSpec |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
4c1f92b to
e22e706
Compare
e22e706 to
98eedc3
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
roback
left a comment
There was a problem hiding this comment.
Had one suggestion.
One "disadvantage" with this change, is that when we now call #normalize, we won't rescue potential PublicSuffix::DomainInvalid errors (since we don't call the .parse method inside the #normalize method anymore, which rescues these errors).
However, I'm not sure if it's possible for that error to get raised in this case, considering the non-normalized URL has already passed through the parse method before.
| normalized_url.path = normalized_path | ||
|
|
||
| self.class.parse(normalized_url) | ||
| public_suffix_domain = self.class.send(:get_public_suffix_domain, normalized_url.host) |
There was a problem hiding this comment.
Instead of calling this private class method here, then passing the result to the .new on the line below, we could make get_public_suffix_domain a private instance method instead and call it inside the initializer.
We still need to call the private .new method, but I don't see how we could get around that.
diff --git a/lib/twingly/url.rb b/lib/twingly/url.rb
index 83a734b..af99fcf 100644
--- a/lib/twingly/url.rb
+++ b/lib/twingly/url.rb
@@ -72,10 +72,7 @@ module Twingly
# URLs that can't be normalized should not be valid
try_addressable_normalize(addressable_uri)
- host = addressable_uri.host
- public_suffix_domain = get_public_suffix_domain(addressable_uri.host)
-
- new(addressable_uri, public_suffix_domain)
+ new(addressable_uri)
rescue *ERRORS_TO_EXTEND => error
error.extend(Twingly::URL::Error)
raise
@@ -120,13 +117,6 @@ module Twingly
label.match?(LETTERS_DIGITS_HYPHEN)
end
- def get_public_suffix_domain(host)
- public_suffix_domain = PublicSuffix.parse(host, list: CUSTOM_PSL, default_rule: nil)
- raise Twingly::URL::Error::ParseError if public_suffix_domain.nil?
- raise Twingly::URL::Error::ParseError if public_suffix_domain.sld.nil?
- public_suffix_domain
- end
-
private :new
private :internal_parse
private :clean_input
@@ -134,12 +124,11 @@ module Twingly
private :try_addressable_normalize
private :valid_hostname?
private :valid_label?
- private :get_public_suffix_domain
end
- def initialize(addressable_uri, public_suffix_domain)
+ def initialize(addressable_uri)
@addressable_uri = addressable_uri
- @public_suffix_domain = public_suffix_domain
+ @public_suffix_domain = get_public_suffix_domain(addressable_uri.host)
end
def scheme
@@ -193,8 +182,7 @@ module Twingly
normalized_url.host = normalized_host
normalized_url.path = normalized_path
- public_suffix_domain = self.class.send(:get_public_suffix_domain, normalized_url.host)
- self.class.send(:new, normalized_url, public_suffix_domain)
+ self.class.send(:new, normalized_url)
end
def normalized_scheme
@@ -262,6 +250,13 @@ module Twingly
attr_reader :addressable_uri, :public_suffix_domain
+ def get_public_suffix_domain(host)
+ public_suffix_domain = PublicSuffix.parse(host, list: CUSTOM_PSL, default_rule: nil)
+ raise Twingly::URL::Error::ParseError if public_suffix_domain.nil?
+ raise Twingly::URL::Error::ParseError if public_suffix_domain.sld.nil?
+ public_suffix_domain
+ end
+
def normalize_blogspot(host, domain)
if domain.sld.downcase == "blogspot"
host.sub(STARTS_WITH_WWW, "").sub(/#{domain.tld}\z/i, "com")
The parsing isn't redone when calling normalized.
I don't want to make any big changes in the this gem right now. But this is minimal change that will do a significant speedup with what I consider a low risk of introducing changed behavior/bugs.