perf(origdatablocks): add datasetId index to OrigDatablock and Datablock by alubbock · Pull Request #2725 · SciCatProject/backend

alubbock · 2026-05-08T21:55:07Z

Description

Adds a MongoDB index on datasetId to the OrigDatablock and Datablock collections.

Motivation

All $lookup pipelines that join datasets to their blocks filter on datasetId, but neither collection had an index on that field. Without it, every lookup requires a full collection scan, O(n) per parent document. At scale this is the dominant cost of the dataset detail view (which eagerly joins origdatablocks and datablocks by default) and of any archival workflow that loads blocks by dataset.

Changes:

src/origdatablocks/schemas/origdatablock.schema.ts -- OrigDatablockSchema.index({ datasetId: 1 })
src/datablocks/schemas/datablock.schema.ts -- DatablockSchema.index({ datasetId: 1 })

Tests included

Included for each change/fix?
Passing?

Two new schema regression tests (origdatablock.schema.spec.ts, datablock.schema.spec.ts) verify that the index definition is present on the compiled Mongoose schema, guarding against accidental removal.

Documentation

swagger documentation updated (required for API changes) -- n/a, no API change
official documentation updated -- n/a, internal index definition only

Summary by Sourcery

Add a MongoDB index on datasetId to datablock-related collections and guard it with schema regression tests.

New Features:

Introduce a datasetId index on the Datablock collection.
Introduce a datasetId index on the OrigDatablock collection.

Tests:

Add schema regression tests to verify the datasetId index exists on Datablock and OrigDatablock Mongoose schemas.

All $lookup pipelines joining datasets to their blocks filter on datasetId, causing a full collection scan per parent document without this index (PERF-001).

sourcery-ai

Hey - I've reviewed your changes and they look great!

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Junjiequan

lgtm

alubbock added 2 commits May 8, 2026 22:07

perf: add datasetId index to OrigDatablock and Datablock collections

2628743

All $lookup pipelines joining datasets to their blocks filter on datasetId, causing a full collection scan per parent document without this index (PERF-001).

test: add regression tests for datasetId index on block schemas

9b2a5b9

alubbock requested a review from a team as a code owner May 8, 2026 21:55

sourcery-ai Bot reviewed May 8, 2026

View reviewed changes

style: fix prettier formatting in origdatablock schema spec

73d0576

Junjiequan approved these changes May 12, 2026

View reviewed changes

omkar-ethz approved these changes May 19, 2026

View reviewed changes

Merge branch 'master' into perf/001-datasetid-index

5ee8929

alubbock enabled auto-merge May 19, 2026 08:49

alubbock merged commit 006da4b into SciCatProject:master May 19, 2026
20 of 21 checks passed

alubbock deleted the perf/001-datasetid-index branch May 19, 2026 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(origdatablocks): add datasetId index to OrigDatablock and Datablock#2725

perf(origdatablocks): add datasetId index to OrigDatablock and Datablock#2725
alubbock merged 4 commits into
SciCatProject:masterfrom
rosalindfranklininstitute:perf/001-datasetid-index

alubbock commented May 8, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Junjiequan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alubbock commented May 8, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Changes:

Tests included

Documentation

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Junjiequan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alubbock commented May 8, 2026 •

edited by sourcery-ai Bot

Loading