-
Notifications
You must be signed in to change notification settings - Fork 20
Optimize redirect collection #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaced the Dictionary for URL lookups with a SortedDictionary that sorts URLs in reverse order (Z-A). Removed the _quickLookupTable and updated methods to work with the new _redirectsZA structure. Added ReverseStringComparer class for custom string comparison.
|
Warning You have reached your daily quota limit. As a reminder, free tier users are limited to 5 requests per day. Please wait up to 24 hours and I will start processing your requests again! |
|
/gemini summary |
Summary of ChangesThis pull request addresses a performance issue related to high CPU usage in the Highlights
Changelog
Activity
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to optimize redirect collection by replacing the Dictionary with a SortedDictionary and removing the quick lookup table. The changes seem well-structured and address the performance issue mentioned in the description. However, there are a few points that need further consideration.
Summary of Findings
- Potential Thread Safety Issues: The
FindInternalmethod still creates a local copy of_redirectsZAusingToArray(). While this avoids multi-threading issues related to modification, it doesn't prevent issues if_redirectsZAis modified after the array is created but before it's used. This could lead to inconsistent results. Consider using a lock or concurrent collection to ensure thread safety. - Redundant URL Decoding: The
HttpUtility.UrlDecodemethod is called multiple times for the same URL inFindInternal. This could be optimized by decoding the URL once and reusing the decoded value.
Merge Readiness
The changes in this pull request seem promising and address the performance concerns. However, the potential thread safety issues in FindInternal need to be addressed before merging. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issues should be addressed before merging.
| // working with local copy to avoid multi-threading issues | ||
| var redirectsZA = _redirectsZACache; | ||
| if (redirectsZA == null) | ||
| { | ||
| redirectsZA = _quickLookupTable.OrderByDescending(x => x.Key, StringComparer.OrdinalIgnoreCase).ToArray(); | ||
| _redirectsZACache = redirectsZA; | ||
| } | ||
| var redirectsZA = _redirectsZA.ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While creating a local copy of _redirectsZA using ToArray() addresses some multi-threading concerns, it doesn't fully prevent issues if _redirectsZA is modified after the array is created but before it's used. This could lead to inconsistent results. Consider using a lock or a concurrent collection to ensure thread safety.
| private CustomRedirect FindInternal(string url) | ||
| { | ||
| url = HttpUtility.UrlDecode(url) ?? string.Empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The url is decoded here, and then the decoded value is used in _redirectsZA.TryGetValue(url, out var redirect). However, the url is decoded again in the HttpUtility.UrlDecode(customRedirect.OldUrl) call in the Add method. It might be more efficient to decode the URL only once and reuse the decoded value.
| /// Cache of URLs sorted ZA for look up of partially matched URLs | ||
| /// </summary> | ||
| private KeyValuePair<string, CustomRedirect>[] _redirectsZACache; | ||
| private readonly SortedDictionary<string, CustomRedirect> _redirectsZA = new(new ReverseStringComparer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching from Dictionary (O(1) lookup) to SortedDictionary (O(log n)) might actually hurt performance, especially since:
- initialization would take more time and be more processing intensive (as it would have to remain sorted at all times)
- the main use case involves frequent lookups
- the performance would be dependent on the collection size
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2?view=net-9.0
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.sorteddictionary-2?view=net-9.0
| redirectsZA = _quickLookupTable.OrderByDescending(x => x.Key, StringComparer.OrdinalIgnoreCase).ToArray(); | ||
| _redirectsZACache = redirectsZA; | ||
| } | ||
| var redirectsZA = _redirectsZA.ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This array used to be calculated and cached. With the proposed change, the array is re-created on every call to FindInternal, likely increasing memory usage and garbage collections. FindInternal is expected to be called multiple times during public Find method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both CustomRedirectCollection.CreateSubSegmentRedirect methods do not use the dictionary at all. They rely on a sorted array of redirects (which was previously pre-calculated on request and cached until a new redirect was added).
Since the dictionary isn’t involved in this execution path of the methods, it’s unclear how these changes would improve the performance of the given methods.
kaspars-ozols
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eagerly loading all redirects into memory is a deliberate trade-off—accepting slower startup in exchange for faster runtime performance. In most cases, such as with DXP or Azure deployment slots, the application is fully started before going live, which likely influenced this as the default behavior.
A configuration option could be introduced to choose between redirect resolution strategies: preloaded or on-demand. This would make sense for larger datasets, but implementing it would require reworking the lookup logic to support partial or lazy loading of redirects.
kaspars-ozols
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be essential to benchmark performance before and after the proposed changes. This would help confirm whether the changes actually improve redirect resolution speed or if they introduce regressions—particularly in lookup performance, memory usage, and initialization time.
Given that the new implementation uses a SortedDictionary (with O(log n) lookups and slower inserts), and recreates arrays on every call rather than caching them, it’s unclear how these changes benefit performance without measurable data to support them. A benchmark would provide clarity and help guide whether this trade-off is worthwhile.
|
Thanks for your feedback @kaspars-ozols. Looking at the PR again after two weeks and with your comments gives me a better perspective than I had when I made these changes. I agree that such fundamental changes as proposed here should not be evaluated without accurate metrics, so I will close the ticket and reopen if and when we gather useful performance data. However, I feel this PR may have uncovered a potential use case for opening up the CustomRedirectCollection for extensibility/customization. Would you be open to a PR that introduces an ICustomRedirectCollection interface? E.g. Accompanied by a factory to replace the instantiations using new(), e.g. |
|
I’d suggest taking a different approach:
That way you could have the minimal changes to the code and allow customization / replacement of IRedirectHandler while not introducing new concepts. |
|
@petrovicnemanja do you know data volume (number of redirects) in the perf session? |
|
there are also some other techniques for lookup available - like bucketing of redirect entities. which could potentially reduce data size to lookup "into". but as usual with everything perf. related - data needed, measurements needed.. |
@valdisiljuconoks Not precisely, but fortunately we didn't purge the redirects table, so I can give you the current figure which should be relatively comparable, ~18k rows. |
Thank you for this thoughtful suggestion @kaspars-ozols. I appreciate the detailed approach you've outlined. I tried doing what you suggested but found myself struggling with the broader implications across the codebase. As a relatively new contributor, I don't yet have the comprehensive understanding of how all the components interact to implement this refactoring confidently. Would you be open to creating a separate issue for this improvement? It deserves proper attention from someone with deeper knowledge of the system. I'm happy to create the issue myself if you prefer, but I think your detailed explanation would make for a great starting point. |
Replaced the Dictionary for URL lookups with a SortedDictionary that sorts URLs in reverse order (Z-A). Removed the _quickLookupTable and updated methods to work with the new _redirectsZA structure. Added ReverseStringComparer class for custom string comparison.
Primary motivation for the change were frequent high CPU reports, whose .netperf files invariably pointed to CustomRedirectCollection.CreateSubSegmentRedirect as the hot path(~99% of Total CPU). Possibly related to issue #148.
All tests are passing.