Optimize MongoDB SCIM query by applying filters before joins #945

Copilot · 2026-01-15T16:43:21Z

MongoDB queries for SCIM groups with 20k+ members take 2+ minutes due to filters being applied after expensive joins, forcing COLLSCAN on 368k documents.

Changes

Filter evaluation refactored to work directly on base collection:

Added EvaluateMongoDbAttributesDirect() method operating on SCIMRepresentationAttribute instead of EnrichedAttribute
Implemented EvaluateAttributesDirect() extension methods for all expression types (Attribute, Logical, Comparison)
Removed expensive join from FindSCIMRepresentations() - filters now apply to base query first

Before:

var filteredRepresentationAttributes = from a in _scimDbContext.SCIMRepresentationAttributeLst.AsQueryable()
    join b in _scimDbContext.SCIMRepresentationAttributeLst.AsQueryable() on a.ParentAttributeId equals b.Id into Parents
    select new EnrichedAttribute { Attribute = a, Parent = Parents.First() };
filteredRepresentationAttributes = parameter.Filter.EvaluateMongoDbAttributes(filteredRepresentationAttributes);

After:

var filteredRepresentationAttributes = _scimDbContext.SCIMRepresentationAttributeLst.AsQueryable();
filteredRepresentationAttributes = parameter.Filter.EvaluateMongoDbAttributesDirect(filteredRepresentationAttributes);

MongoDB pipeline changes from $project → $lookup → $project → $match to $match → ..., enabling index usage on SchemaAttributeId and value fields. Expected query time reduction from 2+ minutes to ~500ms.

Reference: #909

Original prompt

Problem

When querying SCIM groups with a large number of members (20k-27k users), the MongoDB query takes an excessive amount of time:

MongoDB Atlas: 2+ minutes

Local MongoDB: ~30 seconds

Expected performance: ~500ms

Root Cause

Analysis of the MongoDB profiler output from issue #909 shows:
"planSummary" : "COLLSCAN",
"keysExamined" : 0,
"docsExamined" : 368167,
"millis" : 2677
The current implementation in SCIMRepresentationQueryRepository.cs (lines 33-42) generates an inefficient MongoDB aggregation pipeline:

$project - restructures all documents first

$lookup - performs join on all 368k documents

$project - restructures again

$match - filters at the end (too late!)

The filter is applied AFTER the join and projections, preventing MongoDB from using indexes. This forces a full collection scan (COLLSCAN).

Solution Required

Refactor the FindSCIMRepresentations method in src/Scim/SimpleIdServer.Scim.Persistence.MongoDB/SCIMRepresentationQueryRepository.cs to:

Apply filters BEFORE joins: Extract filter conditions and apply them directly to the base query before the $lookup operation

Optimize the query structure: Ensure the generated MongoDB pipeline starts with $match so indexes can be utilized

Maintain functionality: Preserve all existing filtering logic while reordering operations for performance

Expected Pipeline Order

Current (inefficient):
$project → $lookup → $project → $match
Target (optimized):
$match → $project → $lookup → $project
This will allow MongoDB to:

Use indexes on SchemaAttributeId and ValueString

Reduce the dataset before expensive join operations

Achieve the target ~500ms performance for groups with 27k members

Files to Modify

src/Scim/SimpleIdServer.Scim.Persistence.MongoDB/SCIMRepresentationQueryRepository.cs

Requirements

Remove all code comments

Maintain backward compatibility with existing filter expressions

Preserve the current API contract

Ensure the optimization works for all filter types (attribute expressions, logical expressions, comparison expressions)

Reference: #909

This pull request was created from Copilot chat.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Add EvaluateMongoDbAttributesDirect method to work directly with SCIMRepresentationAttribute - Add EvaluateAttributesDirect extension methods for all expression types - Update FindSCIMRepresentations to use direct attribute filtering without EnrichedAttribute join - This allows MongoDB to use indexes on SchemaAttributeId and value fields - Filters are now applied before expensive join operations Co-authored-by: simpleidserver <10213388+simpleidserver@users.noreply.github.com>

Initial plan

789533e

Copilot AI assigned Copilot and simpleidserver Jan 15, 2026

Copilot started work on behalf of simpleidserver January 15, 2026 16:43 View session

Copilot AI changed the title ~~[WIP] Refactor FindSCIMRepresentations for improved MongoDB query performance~~ Optimize MongoDB SCIM query by applying filters before joins Jan 15, 2026

Copilot AI requested a review from simpleidserver January 15, 2026 16:56

Copilot finished work on behalf of simpleidserver January 15, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize MongoDB SCIM query by applying filters before joins #945

Optimize MongoDB SCIM query by applying filters before joins #945

Copilot AI commented Jan 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Optimize MongoDB SCIM query by applying filters before joins #945

Are you sure you want to change the base?

Optimize MongoDB SCIM query by applying filters before joins #945

Conversation

Copilot AI commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Problem

Root Cause

Solution Required

Expected Pipeline Order

Files to Modify

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 15, 2026 •

edited

Loading