Skip to content

Conversation

Copy link

Copilot AI commented Jan 15, 2026

MongoDB queries for SCIM groups with 20k+ members take 2+ minutes due to filters being applied after expensive joins, forcing COLLSCAN on 368k documents.

Changes

Filter evaluation refactored to work directly on base collection:

  • Added EvaluateMongoDbAttributesDirect() method operating on SCIMRepresentationAttribute instead of EnrichedAttribute
  • Implemented EvaluateAttributesDirect() extension methods for all expression types (Attribute, Logical, Comparison)
  • Removed expensive join from FindSCIMRepresentations() - filters now apply to base query first

Before:

var filteredRepresentationAttributes = from a in _scimDbContext.SCIMRepresentationAttributeLst.AsQueryable()
    join b in _scimDbContext.SCIMRepresentationAttributeLst.AsQueryable() on a.ParentAttributeId equals b.Id into Parents
    select new EnrichedAttribute { Attribute = a, Parent = Parents.First() };
filteredRepresentationAttributes = parameter.Filter.EvaluateMongoDbAttributes(filteredRepresentationAttributes);

After:

var filteredRepresentationAttributes = _scimDbContext.SCIMRepresentationAttributeLst.AsQueryable();
filteredRepresentationAttributes = parameter.Filter.EvaluateMongoDbAttributesDirect(filteredRepresentationAttributes);

MongoDB pipeline changes from $project → $lookup → $project → $match to $match → ..., enabling index usage on SchemaAttributeId and value fields. Expected query time reduction from 2+ minutes to ~500ms.

Reference: #909

Original prompt

Problem

When querying SCIM groups with a large number of members (20k-27k users), the MongoDB query takes an excessive amount of time:

  • MongoDB Atlas: 2+ minutes
  • Local MongoDB: ~30 seconds
  • Expected performance: ~500ms

Root Cause

Analysis of the MongoDB profiler output from issue #909 shows:

"planSummary" : "COLLSCAN",
"keysExamined" : 0,
"docsExamined" : 368167,
"millis" : 2677

The current implementation in SCIMRepresentationQueryRepository.cs (lines 33-42) generates an inefficient MongoDB aggregation pipeline:

  1. $project - restructures all documents first
  2. $lookup - performs join on all 368k documents
  3. $project - restructures again
  4. $match - filters at the end (too late!)

The filter is applied AFTER the join and projections, preventing MongoDB from using indexes. This forces a full collection scan (COLLSCAN).

Solution Required

Refactor the FindSCIMRepresentations method in src/Scim/SimpleIdServer.Scim.Persistence.MongoDB/SCIMRepresentationQueryRepository.cs to:

  1. Apply filters BEFORE joins: Extract filter conditions and apply them directly to the base query before the $lookup operation
  2. Optimize the query structure: Ensure the generated MongoDB pipeline starts with $match so indexes can be utilized
  3. Maintain functionality: Preserve all existing filtering logic while reordering operations for performance

Expected Pipeline Order

Current (inefficient):

$project → $lookup → $project → $match

Target (optimized):

$match → $project → $lookup → $project

This will allow MongoDB to:

  • Use indexes on SchemaAttributeId and ValueString
  • Reduce the dataset before expensive join operations
  • Achieve the target ~500ms performance for groups with 27k members

Files to Modify

  • src/Scim/SimpleIdServer.Scim.Persistence.MongoDB/SCIMRepresentationQueryRepository.cs

Requirements

  • Remove all code comments
  • Maintain backward compatibility with existing filter expressions
  • Preserve the current API contract
  • Ensure the optimization works for all filter types (attribute expressions, logical expressions, comparison expressions)

Reference: #909

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Add EvaluateMongoDbAttributesDirect method to work directly with SCIMRepresentationAttribute
- Add EvaluateAttributesDirect extension methods for all expression types
- Update FindSCIMRepresentations to use direct attribute filtering without EnrichedAttribute join
- This allows MongoDB to use indexes on SchemaAttributeId and value fields
- Filters are now applied before expensive join operations

Co-authored-by: simpleidserver <10213388+simpleidserver@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor FindSCIMRepresentations for improved MongoDB query performance Optimize MongoDB SCIM query by applying filters before joins Jan 15, 2026
Copilot AI requested a review from simpleidserver January 15, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants