Skip to content

Conversation

@petrovicnemanja
Copy link

Replaced the Dictionary for URL lookups with a SortedDictionary that sorts URLs in reverse order (Z-A). Removed the _quickLookupTable and updated methods to work with the new _redirectsZA structure. Added ReverseStringComparer class for custom string comparison.

Primary motivation for the change were frequent high CPU reports, whose .netperf files invariably pointed to CustomRedirectCollection.CreateSubSegmentRedirect as the hot path(~99% of Total CPU). Possibly related to issue #148.

All tests are passing.

image

Replaced the Dictionary for URL lookups with a SortedDictionary that sorts URLs in reverse order (Z-A). Removed the _quickLookupTable and updated methods to work with the new _redirectsZA structure. Added ReverseStringComparer class for custom string comparison.
@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. As a reminder, free tier users are limited to 5 requests per day. Please wait up to 24 hours and I will start processing your requests again!

@petrovicnemanja petrovicnemanja changed the title Feature/optimize redirect collection Optimize redirect collection Mar 20, 2025
@petrovicnemanja
Copy link
Author

/gemini summary

@gemini-code-assist
Copy link

Summary of Changes

This pull request addresses a performance issue related to high CPU usage in the CustomRedirectCollection.CreateSubSegmentRedirect method. The author, petrovicnemanja, replaced the Dictionary used for URL lookups with a SortedDictionary that sorts URLs in reverse order (Z-A). The _quickLookupTable was removed, and methods were updated to work with the new _redirectsZA structure. A ReverseStringComparer class was added for custom string comparison. All tests are passing.

Highlights

  • Performance Optimization: Replaced Dictionary with SortedDictionary to improve URL lookup performance and reduce CPU usage.
  • Data Structure Change: Removed _quickLookupTable and _redirectsZACache, utilizing _redirectsZA for storing redirects in reverse order.
  • Custom Comparer: Implemented ReverseStringComparer for Z-A string comparison in the SortedDictionary.

Changelog

  • src/Geta.NotFoundHandler/Core/Redirects/CustomRedirectCollection.cs
    • Replaced _quickLookupTable (Dictionary) with _redirectsZA (SortedDictionary) for URL lookups (lines 24, 42, 77, 83, 107).
    • Removed _redirectsZACache (lines 29, 92, 104, 108).
    • Modified Add method to use _redirectsZA instead of _quickLookupTable (lines 77).
    • Modified GetEnumerator method to use _redirectsZA (line 37).
    • Modified FindInternal method to use _redirectsZA instead of _quickLookupTable and _redirectsZACache (lines 83, 89).
  • src/Geta.NotFoundHandler/Core/Redirects/ReverseStringComparer.cs
    • Added ReverseStringComparer class to perform reverse string comparison for the SortedDictionary.

Activity

  • gemini-code-assist[bot] hit the daily quota limit on 2025-03-20.
  • petrovicnemanja requested a summary on 2025-03-26.

@petrovicnemanja
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to optimize redirect collection by replacing the Dictionary with a SortedDictionary and removing the quick lookup table. The changes seem well-structured and address the performance issue mentioned in the description. However, there are a few points that need further consideration.

Summary of Findings

  • Potential Thread Safety Issues: The FindInternal method still creates a local copy of _redirectsZA using ToArray(). While this avoids multi-threading issues related to modification, it doesn't prevent issues if _redirectsZA is modified after the array is created but before it's used. This could lead to inconsistent results. Consider using a lock or concurrent collection to ensure thread safety.
  • Redundant URL Decoding: The HttpUtility.UrlDecode method is called multiple times for the same URL in FindInternal. This could be optimized by decoding the URL once and reusing the decoded value.

Merge Readiness

The changes in this pull request seem promising and address the performance concerns. However, the potential thread safety issues in FindInternal need to be addressed before merging. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issues should be addressed before merging.

Comment on lines 88 to +89
// working with local copy to avoid multi-threading issues
var redirectsZA = _redirectsZACache;
if (redirectsZA == null)
{
redirectsZA = _quickLookupTable.OrderByDescending(x => x.Key, StringComparer.OrdinalIgnoreCase).ToArray();
_redirectsZACache = redirectsZA;
}
var redirectsZA = _redirectsZA.ToArray();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While creating a local copy of _redirectsZA using ToArray() addresses some multi-threading concerns, it doesn't fully prevent issues if _redirectsZA is modified after the array is created but before it's used. This could lead to inconsistent results. Consider using a lock or a concurrent collection to ensure thread safety.

Comment on lines 80 to 82
private CustomRedirect FindInternal(string url)
{
url = HttpUtility.UrlDecode(url) ?? string.Empty;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The url is decoded here, and then the decoded value is used in _redirectsZA.TryGetValue(url, out var redirect). However, the url is decoded again in the HttpUtility.UrlDecode(customRedirect.OldUrl) call in the Add method. It might be more efficient to decode the URL only once and reuse the decoded value.

/// Cache of URLs sorted ZA for look up of partially matched URLs
/// </summary>
private KeyValuePair<string, CustomRedirect>[] _redirectsZACache;
private readonly SortedDictionary<string, CustomRedirect> _redirectsZA = new(new ReverseStringComparer());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching from Dictionary (O(1) lookup) to SortedDictionary (O(log n)) might actually hurt performance, especially since:

  • initialization would take more time and be more processing intensive (as it would have to remain sorted at all times)
  • the main use case involves frequent lookups
  • the performance would be dependent on the collection size

https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2?view=net-9.0
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.sorteddictionary-2?view=net-9.0

redirectsZA = _quickLookupTable.OrderByDescending(x => x.Key, StringComparer.OrdinalIgnoreCase).ToArray();
_redirectsZACache = redirectsZA;
}
var redirectsZA = _redirectsZA.ToArray();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This array used to be calculated and cached. With the proposed change, the array is re-created on every call to FindInternal, likely increasing memory usage and garbage collections. FindInternal is expected to be called multiple times during public Find method.

Copy link
Contributor

@kaspars-ozols kaspars-ozols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both CustomRedirectCollection.CreateSubSegmentRedirect methods do not use the dictionary at all. They rely on a sorted array of redirects (which was previously pre-calculated on request and cached until a new redirect was added).

Since the dictionary isn’t involved in this execution path of the methods, it’s unclear how these changes would improve the performance of the given methods.

Copy link
Contributor

@kaspars-ozols kaspars-ozols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eagerly loading all redirects into memory is a deliberate trade-off—accepting slower startup in exchange for faster runtime performance. In most cases, such as with DXP or Azure deployment slots, the application is fully started before going live, which likely influenced this as the default behavior.

A configuration option could be introduced to choose between redirect resolution strategies: preloaded or on-demand. This would make sense for larger datasets, but implementing it would require reworking the lookup logic to support partial or lazy loading of redirects.

Copy link
Contributor

@kaspars-ozols kaspars-ozols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be essential to benchmark performance before and after the proposed changes. This would help confirm whether the changes actually improve redirect resolution speed or if they introduce regressions—particularly in lookup performance, memory usage, and initialization time.

Given that the new implementation uses a SortedDictionary (with O(log n) lookups and slower inserts), and recreates arrays on every call rather than caching them, it’s unclear how these changes benefit performance without measurable data to support them. A benchmark would provide clarity and help guide whether this trade-off is worthwhile.

@petrovicnemanja
Copy link
Author

petrovicnemanja commented Apr 3, 2025

Thanks for your feedback @kaspars-ozols. Looking at the PR again after two weeks and with your comments gives me a better perspective than I had when I made these changes. I agree that such fundamental changes as proposed here should not be evaluated without accurate metrics, so I will close the ticket and reopen if and when we gather useful performance data.

However, I feel this PR may have uncovered a potential use case for opening up the CustomRedirectCollection for extensibility/customization. Would you be open to a PR that introduces an ICustomRedirectCollection interface? E.g.

public interface ICustomRedirectCollection : IEnumerable<CustomRedirect>
{
    void Add(CustomRedirect customRedirect);
    CustomRedirect Find(Uri urlNotFound);
}

Accompanied by a factory to replace the instantiations using new(), e.g.

public interface ICustomRedirectCollectionFactory
{
    ICustomRedirectCollection Create();
    ICustomRedirectCollection Create(IEnumerable<INotFoundHandler> providers);
}

@kaspars-ozols
Copy link
Contributor

I’d suggest taking a different approach:

  • Remove the Find method from CustomRedirectCollection, treating it purely as a data structure that holds a plain list of redirects. This simplifies its purpose and avoids having to deal with calls to new CustomRedirectCollection() throughout the codebase. This is the actual intent in multiple places (to load redirects from file and store them in memory). You wouldn’t even need a constructor with _providers, nor would you have to worry about caching.

  • Move the current Find logic into CustomRedirectHandler.Find, and introduce caching there. The providers could be injected since the class already gets built using DI. I assume the Set method is called once all redirects are loaded or when the underlying collection is updated.

  • Update the tests to validate the Find method within CustomRedirectHandler, rather than CustomRedirectCollection.

That way you could have the minimal changes to the code and allow customization / replacement of IRedirectHandler while not introducing new concepts.

@valdisiljuconoks
Copy link
Member

@petrovicnemanja do you know data volume (number of redirects) in the perf session?

@valdisiljuconoks
Copy link
Member

there are also some other techniques for lookup available - like bucketing of redirect entities. which could potentially reduce data size to lookup "into".

but as usual with everything perf. related - data needed, measurements needed..

@petrovicnemanja
Copy link
Author

@petrovicnemanja do you know data volume (number of redirects) in the perf session?

@valdisiljuconoks Not precisely, but fortunately we didn't purge the redirects table, so I can give you the current figure which should be relatively comparable, ~18k rows.

@petrovicnemanja
Copy link
Author

I’d suggest taking a different approach:

  • Remove the Find method from CustomRedirectCollection, treating it purely as a data structure that holds a plain list of redirects. This simplifies its purpose and avoids having to deal with calls to new CustomRedirectCollection() throughout the codebase. This is the actual intent in multiple places (to load redirects from file and store them in memory). You wouldn’t even need a constructor with _providers, nor would you have to worry about caching.
  • Move the current Find logic into CustomRedirectHandler.Find, and introduce caching there. The providers could be injected since the class already gets built using DI. I assume the Set method is called once all redirects are loaded or when the underlying collection is updated.
  • Update the tests to validate the Find method within CustomRedirectHandler, rather than CustomRedirectCollection.

That way you could have the minimal changes to the code and allow customization / replacement of IRedirectHandler while not introducing new concepts.

Thank you for this thoughtful suggestion @kaspars-ozols. I appreciate the detailed approach you've outlined.

I tried doing what you suggested but found myself struggling with the broader implications across the codebase. As a relatively new contributor, I don't yet have the comprehensive understanding of how all the components interact to implement this refactoring confidently.

Would you be open to creating a separate issue for this improvement? It deserves proper attention from someone with deeper knowledge of the system. I'm happy to create the issue myself if you prefer, but I think your detailed explanation would make for a great starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants