Optimize redirect collection #166

petrovicnemanja · 2025-03-20T15:16:54Z

Replaced the Dictionary for URL lookups with a SortedDictionary that sorts URLs in reverse order (Z-A). Removed the _quickLookupTable and updated methods to work with the new _redirectsZA structure. Added ReverseStringComparer class for custom string comparison.

Primary motivation for the change were frequent high CPU reports, whose .netperf files invariably pointed to CustomRedirectCollection.CreateSubSegmentRedirect as the hot path(~99% of Total CPU). Possibly related to issue #148.

All tests are passing.

Replaced the Dictionary for URL lookups with a SortedDictionary that sorts URLs in reverse order (Z-A). Removed the _quickLookupTable and updated methods to work with the new _redirectsZA structure. Added ReverseStringComparer class for custom string comparison.

gemini-code-assist · 2025-03-20T15:16:58Z

Warning

You have reached your daily quota limit. As a reminder, free tier users are limited to 5 requests per day. Please wait up to 24 hours and I will start processing your requests again!

petrovicnemanja · 2025-03-26T09:04:52Z

/gemini summary

gemini-code-assist · 2025-03-26T09:05:01Z

Summary of Changes

This pull request addresses a performance issue related to high CPU usage in the CustomRedirectCollection.CreateSubSegmentRedirect method. The author, petrovicnemanja, replaced the Dictionary used for URL lookups with a SortedDictionary that sorts URLs in reverse order (Z-A). The _quickLookupTable was removed, and methods were updated to work with the new _redirectsZA structure. A ReverseStringComparer class was added for custom string comparison. All tests are passing.

Highlights

Performance Optimization: Replaced Dictionary with SortedDictionary to improve URL lookup performance and reduce CPU usage.
Data Structure Change: Removed _quickLookupTable and _redirectsZACache, utilizing _redirectsZA for storing redirects in reverse order.
Custom Comparer: Implemented ReverseStringComparer for Z-A string comparison in the SortedDictionary.

Changelog

src/Geta.NotFoundHandler/Core/Redirects/CustomRedirectCollection.cs
- Replaced _quickLookupTable (Dictionary) with _redirectsZA (SortedDictionary) for URL lookups (lines 24, 42, 77, 83, 107).
- Removed _redirectsZACache (lines 29, 92, 104, 108).
- Modified Add method to use _redirectsZA instead of _quickLookupTable (lines 77).
- Modified GetEnumerator method to use _redirectsZA (line 37).
- Modified FindInternal method to use _redirectsZA instead of _quickLookupTable and _redirectsZACache (lines 83, 89).
src/Geta.NotFoundHandler/Core/Redirects/ReverseStringComparer.cs
- Added ReverseStringComparer class to perform reverse string comparison for the SortedDictionary.

Activity

gemini-code-assist[bot] hit the daily quota limit on 2025-03-20.
petrovicnemanja requested a summary on 2025-03-26.

petrovicnemanja · 2025-03-26T09:06:04Z

/gemini review

gemini-code-assist

Code Review

This pull request aims to optimize redirect collection by replacing the Dictionary with a SortedDictionary and removing the quick lookup table. The changes seem well-structured and address the performance issue mentioned in the description. However, there are a few points that need further consideration.

Summary of Findings

Potential Thread Safety Issues: The FindInternal method still creates a local copy of _redirectsZA using ToArray(). While this avoids multi-threading issues related to modification, it doesn't prevent issues if _redirectsZA is modified after the array is created but before it's used. This could lead to inconsistent results. Consider using a lock or concurrent collection to ensure thread safety.
Redundant URL Decoding: The HttpUtility.UrlDecode method is called multiple times for the same URL in FindInternal. This could be optimized by decoding the URL once and reusing the decoded value.

Merge Readiness

The changes in this pull request seem promising and address the performance concerns. However, the potential thread safety issues in FindInternal need to be addressed before merging. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issues should be addressed before merging.

gemini-code-assist · 2025-03-26T09:06:16Z

src/Geta.NotFoundHandler/Core/Redirects/CustomRedirectCollection.cs

        // working with local copy to avoid multi-threading issues
-        var redirectsZA = _redirectsZACache;
-        if (redirectsZA == null)
-        {
-            redirectsZA = _quickLookupTable.OrderByDescending(x => x.Key, StringComparer.OrdinalIgnoreCase).ToArray();
-            _redirectsZACache = redirectsZA;
-        }
+        var redirectsZA = _redirectsZA.ToArray();


While creating a local copy of _redirectsZA using ToArray() addresses some multi-threading concerns, it doesn't fully prevent issues if _redirectsZA is modified after the array is created but before it's used. This could lead to inconsistent results. Consider using a lock or a concurrent collection to ensure thread safety.

gemini-code-assist · 2025-03-26T09:06:16Z

src/Geta.NotFoundHandler/Core/Redirects/CustomRedirectCollection.cs

    private CustomRedirect FindInternal(string url)
    {
        url = HttpUtility.UrlDecode(url) ?? string.Empty;


The url is decoded here, and then the decoded value is used in _redirectsZA.TryGetValue(url, out var redirect). However, the url is decoded again in the HttpUtility.UrlDecode(customRedirect.OldUrl) call in the Add method. It might be more efficient to decode the URL only once and reuse the decoded value.

kaspars-ozols · 2025-04-03T13:34:53Z

src/Geta.NotFoundHandler/Core/Redirects/CustomRedirectCollection.cs

-    /// Cache of URLs sorted ZA for look up of partially matched URLs
-    /// </summary>
-    private KeyValuePair<string, CustomRedirect>[] _redirectsZACache;
+    private readonly SortedDictionary<string, CustomRedirect> _redirectsZA = new(new ReverseStringComparer());


Switching from Dictionary (O(1) lookup) to SortedDictionary (O(log n)) might actually hurt performance, especially since:

initialization would take more time and be more processing intensive (as it would have to remain sorted at all times)

the main use case involves frequent lookups

the performance would be dependent on the collection size

https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2?view=net-9.0
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.sorteddictionary-2?view=net-9.0

kaspars-ozols · 2025-04-03T13:37:13Z

src/Geta.NotFoundHandler/Core/Redirects/CustomRedirectCollection.cs

-            redirectsZA = _quickLookupTable.OrderByDescending(x => x.Key, StringComparer.OrdinalIgnoreCase).ToArray();
-            _redirectsZACache = redirectsZA;
-        }
+        var redirectsZA = _redirectsZA.ToArray();


This array used to be calculated and cached. With the proposed change, the array is re-created on every call to FindInternal, likely increasing memory usage and garbage collections. FindInternal is expected to be called multiple times during public Find method.

kaspars-ozols

Both CustomRedirectCollection.CreateSubSegmentRedirect methods do not use the dictionary at all. They rely on a sorted array of redirects (which was previously pre-calculated on request and cached until a new redirect was added).

Since the dictionary isn’t involved in this execution path of the methods, it’s unclear how these changes would improve the performance of the given methods.

kaspars-ozols

Eagerly loading all redirects into memory is a deliberate trade-off—accepting slower startup in exchange for faster runtime performance. In most cases, such as with DXP or Azure deployment slots, the application is fully started before going live, which likely influenced this as the default behavior.

A configuration option could be introduced to choose between redirect resolution strategies: preloaded or on-demand. This would make sense for larger datasets, but implementing it would require reworking the lookup logic to support partial or lazy loading of redirects.

kaspars-ozols

It would be essential to benchmark performance before and after the proposed changes. This would help confirm whether the changes actually improve redirect resolution speed or if they introduce regressions—particularly in lookup performance, memory usage, and initialization time.

Given that the new implementation uses a SortedDictionary (with O(log n) lookups and slower inserts), and recreates arrays on every call rather than caching them, it’s unclear how these changes benefit performance without measurable data to support them. A benchmark would provide clarity and help guide whether this trade-off is worthwhile.

petrovicnemanja · 2025-04-03T15:41:26Z

Thanks for your feedback @kaspars-ozols. Looking at the PR again after two weeks and with your comments gives me a better perspective than I had when I made these changes. I agree that such fundamental changes as proposed here should not be evaluated without accurate metrics, so I will close the ticket and reopen if and when we gather useful performance data.

However, I feel this PR may have uncovered a potential use case for opening up the CustomRedirectCollection for extensibility/customization. Would you be open to a PR that introduces an ICustomRedirectCollection interface? E.g.

public interface ICustomRedirectCollection : IEnumerable<CustomRedirect>
{
    void Add(CustomRedirect customRedirect);
    CustomRedirect Find(Uri urlNotFound);
}

Accompanied by a factory to replace the instantiations using new(), e.g.

public interface ICustomRedirectCollectionFactory
{
    ICustomRedirectCollection Create();
    ICustomRedirectCollection Create(IEnumerable<INotFoundHandler> providers);
}

kaspars-ozols · 2025-04-03T20:08:51Z

I’d suggest taking a different approach:

Remove the Find method from CustomRedirectCollection, treating it purely as a data structure that holds a plain list of redirects. This simplifies its purpose and avoids having to deal with calls to new CustomRedirectCollection() throughout the codebase. This is the actual intent in multiple places (to load redirects from file and store them in memory). You wouldn’t even need a constructor with _providers, nor would you have to worry about caching.
Move the current Find logic into CustomRedirectHandler.Find, and introduce caching there. The providers could be injected since the class already gets built using DI. I assume the Set method is called once all redirects are loaded or when the underlying collection is updated.
Update the tests to validate the Find method within CustomRedirectHandler, rather than CustomRedirectCollection.

That way you could have the minimal changes to the code and allow customization / replacement of IRedirectHandler while not introducing new concepts.

valdisiljuconoks · 2025-04-03T20:22:28Z

@petrovicnemanja do you know data volume (number of redirects) in the perf session?

valdisiljuconoks · 2025-04-03T20:26:33Z

there are also some other techniques for lookup available - like bucketing of redirect entities. which could potentially reduce data size to lookup "into".

but as usual with everything perf. related - data needed, measurements needed..

petrovicnemanja · 2025-04-04T07:01:15Z

@petrovicnemanja do you know data volume (number of redirects) in the perf session?

@valdisiljuconoks Not precisely, but fortunately we didn't purge the redirects table, so I can give you the current figure which should be relatively comparable, ~18k rows.

petrovicnemanja · 2025-04-04T07:40:08Z

I’d suggest taking a different approach:

Remove the Find method from CustomRedirectCollection, treating it purely as a data structure that holds a plain list of redirects. This simplifies its purpose and avoids having to deal with calls to new CustomRedirectCollection() throughout the codebase. This is the actual intent in multiple places (to load redirects from file and store them in memory). You wouldn’t even need a constructor with _providers, nor would you have to worry about caching.

Move the current Find logic into CustomRedirectHandler.Find, and introduce caching there. The providers could be injected since the class already gets built using DI. I assume the Set method is called once all redirects are loaded or when the underlying collection is updated.

Update the tests to validate the Find method within CustomRedirectHandler, rather than CustomRedirectCollection.

That way you could have the minimal changes to the code and allow customization / replacement of IRedirectHandler while not introducing new concepts.

Thank you for this thoughtful suggestion @kaspars-ozols. I appreciate the detailed approach you've outlined.

I tried doing what you suggested but found myself struggling with the broader implications across the codebase. As a relatively new contributor, I don't yet have the comprehensive understanding of how all the components interact to implement this refactoring confidently.

Would you be open to creating a separate issue for this improvement? It deserves proper attention from someone with deeper knowledge of the system. I'm happy to create the issue myself if you prefer, but I think your detailed explanation would make for a great starting point.

petrovicnemanja added 2 commits March 20, 2025 15:52

Modify _redirectsZA summary

3cb07c3

petrovicnemanja changed the title ~~Feature/optimize redirect collection~~ Optimize redirect collection Mar 20, 2025

gemini-code-assist bot suggested changes Mar 26, 2025

View reviewed changes

petrovicnemanja requested review from marisks and valdisiljuconoks April 3, 2025 08:50

kaspars-ozols reviewed Apr 3, 2025

View reviewed changes

kaspars-ozols suggested changes Apr 3, 2025

View reviewed changes

petrovicnemanja closed this Apr 3, 2025

Optimize redirect collection #166

Optimize redirect collection #166

Uh oh!

Conversation

petrovicnemanja commented Mar 20, 2025

Uh oh!

gemini-code-assist bot commented Mar 20, 2025

Uh oh!

petrovicnemanja commented Mar 26, 2025

Uh oh!

gemini-code-assist bot commented Mar 26, 2025

Summary of Changes

Highlights

Changelog

Activity

Uh oh!

petrovicnemanja commented Mar 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

kaspars-ozols Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

kaspars-ozols Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

kaspars-ozols left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaspars-ozols left a comment

Choose a reason for hiding this comment

Uh oh!

kaspars-ozols left a comment

Choose a reason for hiding this comment

Uh oh!

petrovicnemanja commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaspars-ozols commented Apr 3, 2025

Uh oh!

valdisiljuconoks commented Apr 3, 2025

Uh oh!

valdisiljuconoks commented Apr 3, 2025

Uh oh!

petrovicnemanja commented Apr 4, 2025

Uh oh!

petrovicnemanja commented Apr 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kaspars-ozols left a comment •

edited

Loading

petrovicnemanja commented Apr 3, 2025 •

edited

Loading