Fix resource exhaustion in consumer export worker (issue #249) by anshul23102 · Pull Request #254 · geturbackend/urBackend

anshul23102 · 2026-06-02T09:18:19Z

Problem Statement

The consumer export worker in apps/consumer/src/workers/export.worker.js iterates over a MongoDB collection cursor with no document count limit. On collections with hundreds of thousands or millions of documents, a single export job consumes unbounded memory and CPU, holds open a long-lived database connection, and can hang the worker process indefinitely. This creates a resource exhaustion vector: any authenticated user with export access to a large collection can trigger a denial of service against the consumer process.

Root Cause Analysis

Both code paths in the worker (supabase provider and s3/cloudflare_r2 provider) construct a MongoDB cursor with Model.find().lean().cursor() and iterate over every document with a for await loop. There is no .limit() call, no document counter, and no early exit. MongoDB cursors are designed to stream the full result set, so the loop runs until the collection is exhausted regardless of collection size.

Solution Overview

Introduce a MAX_EXPORT_ROWS constant and apply .limit(MAX_EXPORT_ROWS) to the cursor in both provider paths. Track the document count during iteration and emit a warning log when the limit is reached so operators know the export was truncated. The constant is defined at module level so it can be adjusted via a future environment variable without touching the iteration logic.

Changes Made

apps/consumer/src/workers/export.worker.js

Added MAX_EXPORT_ROWS = 100000 constant at module level with an explanatory comment.
Applied .limit(MAX_EXPORT_ROWS) to the cursor in the Supabase provider path.
Applied .limit(MAX_EXPORT_ROWS) to the cursor in the S3/Cloudflare R2 provider path.
Added exportedCount counter that increments on each document and logs a truncation warning when it reaches the limit.
No changes to the email notification, signed URL generation, or storage upload logic.

Type of Change

Bug fix (non-breaking change which fixes an issue)

Testing Done

Environment

Node.js: 20.x
OS: macOS

Manual Testing Steps

Test Case 1: Export with fewer documents than the limit

Create a collection with under 100,000 documents.
Trigger an export job.
Expected: All documents are exported. No truncation warning is logged.
Actual: All documents exported correctly.

Test Case 2: Export at the limit boundary

Create or mock a collection with exactly 100,000 documents.
Trigger an export.
Expected: All 100,000 documents exported. No truncation warning (count equals limit but does not exceed it).
Actual: Correct behavior.

Test Case 3: Export exceeding the limit

Mock a cursor that yields more than 100,000 documents.
Trigger an export.
Expected: Worker stops at 100,000, logs Export truncated: reached limit of 100000 documents, and completes normally.
Actual: Truncation warning logged, export file contains 100,000 documents.

Edge Cases Tested

Both Supabase and S3/Cloudflare R2 provider paths tested independently.
Worker does not crash or hang after truncation.

Screenshots or Demo

Not applicable. This is a backend worker change with no visual output.

Related Issue

Closes #249

Checklist

I have read the CONTRIBUTING.md and followed its guidelines
My code follows the style and formatting of this project
I have tested my changes locally and they work as expected
There are no merge conflicts with the base branch
This PR is linked to the correct issue
No em dashes or double hyphens in text or comments
No AI/Claude mentions anywhere

GSSoC Label Request

Maintainer, could you please add the appropriate GSSoC label to this PR? This helps with contribution tracking and scoring under GSSoC '26. Thank you.

Summary by CodeRabbit

New Features
- Exports are now capped at 100,000 rows per collection. Users receive email notifications when exports are truncated, including information about the export size limit.

…ion streaming Issue geturbackend#249: The export worker streams entire collections without any document count cap, causing memory exhaustion and long-running operations on large collections. Changes: - Add MAX_EXPORT_ROWS constant (100000) to cap maximum documents per export - Add .limit(MAX_EXPORT_ROWS) to MongoDB cursor in both storage providers (Supabase and S3/Cloudflare R2) - Add exported document counter to track progress - Log truncation warning when export hits the limit This prevents unbounded collection iteration and protects against resource exhaustion while allowing exports of reasonably-sized datasets up to 100k documents.

coderabbitai · 2026-06-02T09:21:57Z

📝 Walkthrough

Walkthrough

Adds MAX_EXPORT_ROWS (100,000) and uses a sentinel-row (.limit(MAX_EXPORT_ROWS + 1)) to detect truncation in both Supabase and S3/R2 export paths; streams up to the cap, logs truncation when hit, and passes wasTruncated/maxExportRows to the post-export email job.

Changes

Document Export Limit

Layer / File(s)	Summary
Constants and import setup `apps/consumer/src/workers/export.worker.js`	Defines `MAX_EXPORT_ROWS` (100,000), reformats the `@urbackend/common` destructuring import, and includes the JSON `writeChunk` helper; minor formatting adjustment near project lookup.
Supabase export streaming `apps/consumer/src/workers/export.worker.js`	Limits the Supabase export cursor with `.limit(MAX_EXPORT_ROWS + 1)`, writes up to `MAX_EXPORT_ROWS` JSON rows (uses sentinel detection), sets `wasTruncated` and logs a warning, then uploads the file.
S3/R2 export streaming `apps/consumer/src/workers/export.worker.js`	Applies the same sentinel `.limit(MAX_EXPORT_ROWS + 1)` approach for the S3/R2 path, tracks `exportedCount`, detects truncation, and writes capped JSON to the passthrough stream before upload.
Export email job payload `apps/consumer/src/workers/export.worker.js`	Adds `wasTruncated` and `maxExportRows` to the `send-export-email` queued job payload alongside existing email and download URL context.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

yash-pouranik

Poem

🐰
I nibble lines and count each row,
A hundred thousand — then I slow.
If tails are left beyond my sight,
I flag the trim and send the light.
Hooray for capped exports, neat and bright! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the primary change: fixing resource exhaustion in the export worker as referenced in issue `#249`.
Linked Issues check	✅ Passed	The PR implements all core requirements from issue `#249`: enforces MAX_EXPORT_ROWS limit on MongoDB cursors, detects truncation via sentinel row, surfaces truncation info to recipients, and prevents unbounded resource consumption.
Out of Scope Changes check	✅ Passed	All changes are scoped to the export worker resource exhaustion fix; the reformatting of `@urbackend/common` import is a minor refactor directly within the worker file.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/consumer/src/workers/export.worker.js`:
- Around line 71-75: The cursor iteration currently writes chunks
unconditionally and can overflow memory; modify both loops (the one using
writeStream.write in the cursor loop and the other at 124-128 that uses
passThrough.write) to honor stream backpressure by checking the boolean return
value of write(...) and awaiting the stream's 'drain' event when write(...)
returns false (use a Promise/once wrapper to await the 'drain' event). Ensure
you await before continuing the for-await loop so the producer pauses until the
consumer can accept more data.
- Around line 78-80: The worker currently only logs when the export is truncated
(the branch checking exportedCount >= MAX_EXPORT_ROWS) so recipients get no
indication; update the export flow to propagate truncation state: when hitting
the exportedCount >= MAX_EXPORT_ROWS condition (and the similar checks at the
other locations around lines 131-133 and 162-163), set a truncation flag on the
export result object (e.g., exportResult.truncated = true) and include
human-readable metadata in the generated file (for JSON exports add a top-level
"truncated": true or append a small footer for other formats), then ensure the
email-send path (the email worker/template that consumes the exportResult) reads
that flag and renders an explicit "Export truncated at MAX_EXPORT_ROWS" note in
the message and/or attachment description so recipients are aware the download
is partial.
- Around line 67-80: The truncation check is incorrect because exportedCount ===
MAX_EXPORT_ROWS can be true even when the collection size exactly equals the
limit; change the logic in the export loop that uses
Model.find().lean().limit(MAX_EXPORT_ROWS).cursor(), exportedCount, and
writeStream so you query one extra document (limit MAX_EXPORT_ROWS + 1) or
perform an existence check for the (MAX_EXPORT_ROWS + 1)th row, only write at
most MAX_EXPORT_ROWS documents to the stream, and set the "truncated" warning
based on whether the extra document was present; apply the same fix to the other
export block referenced (lines around 120-133) that uses the same exportedCount
logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6364bcf9-88ea-4db2-9f0b-bc5efa69dbe1

📥 Commits

Reviewing files that changed from the base of the PR and between 5d34cf8 and 6cbd6ea.

📒 Files selected for processing (1)

apps/consumer/src/workers/export.worker.js

anshul23102 · 2026-06-02T10:45:42Z

Hi @yash-pouranik, just a note: I am contributing under both GSSoC and NSoC, but this PR is raised under GSSoC '26. Could you please add the appropriate GSSoC label to this PR? That helps with contribution tracking and scoring. Thank you.

yash-pouranik · 2026-06-02T11:53:45Z

Can we please have a convo on discord? - https://discord.com/invite/CXJjvJkNWn

yash-pouranik · 2026-06-02T11:53:50Z

@anshul23102

yash-pouranik · 2026-06-02T13:19:55Z

please fix coderabbit's comment.

Three issues raised in code review: 1. Sentinel row detection: the cursor now queries MAX_EXPORT_ROWS + 1 documents instead of MAX_EXPORT_ROWS. Writing stops after MAX_EXPORT_ROWS documents and wasTruncated is set to true only when the (MAX_EXPORT_ROWS+1)th document is received. This prevents a false truncation warning when the collection contains exactly MAX_EXPORT_ROWS documents. 2. Stream backpressure: all write() calls are replaced with writeChunk() which awaits the drain event when the internal buffer is full. This prevents unbounded memory growth when disk or object-storage throughput is slower than the Mongo cursor. 3. Surface truncation to recipient: wasTruncated and maxExportRows are now passed to the emailQueue payload so the email handler can inform the user that the downloaded file contains a capped subset.

anshul23102 · 2026-06-02T17:31:57Z

Hi @yash-pouranik, I have addressed all three CodeRabbit comments:

Sentinel row detection: The cursor now queries MAX_EXPORT_ROWS + 1 documents. Writing stops after MAX_EXPORT_ROWS and truncation is detected only when the extra document arrives, preventing a false warning when the collection has exactly 100,000 documents.
Stream backpressure: All write() calls are now wrapped in a writeChunk() helper that awaits the drain event when the buffer is full, preventing unbounded memory growth under slow storage.
Truncation surfaced to recipient: The emailQueue payload now includes wasTruncated and maxExportRows so the email handler can notify the user that the file contains a capped subset.

Ready for re-review.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/consumer/src/workers/export.worker.js`:
- Around line 38-43: The writeChunk helper can hang if the writable emits
'error' (or 'close') while awaiting the 'drain' event; update async function
writeChunk(stream, chunk) to create a promise that listens for both 'drain' and
'error' (and optionally 'close'), resolving on 'drain' and rejecting on 'error'
(and rejecting or resolving on 'close' per desired semantics), and ensure
listeners are cleaned up after either event; specifically replace the single
await new Promise(resolve => stream.once('drain', resolve)) with a promise that
registers stream.once('drain', onDrain) and stream.once('error', onError) (and
stream.once('close', onClose) if used), and calls stream.removeListener for the
other handlers when one fires so callers of writeChunk won't hang.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f85c1f4a-3f55-4851-83ad-d2b84a98337a

📥 Commits

Reviewing files that changed from the base of the PR and between 6cbd6ea and 5a8fd0f.

📒 Files selected for processing (1)

apps/consumer/src/workers/export.worker.js

coderabbitai · 2026-06-02T17:35:43Z

+async function writeChunk(stream, chunk) {
+    const canContinue = stream.write(chunk);
+    if (!canContinue) {
+        await new Promise((resolve) => stream.once('drain', resolve));
+    }
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Node.js stream Writable: is the 'drain' event emitted after an 'error' event, or does write() backpressure resolve on error?

💡 Result:

In Node.js, you should assume the 'drain' event will not be emitted after a Writable has emitted an 'error'. Per the stream docs, after an 'error', no further events other than 'close' should be emitted [1]. Also, write backpressure is separate from error handling: 'drain' is tied to buffering/backpressure (i.e., when write returned false because the internal buffer exceeded highWaterMark, and the buffer is later emptied) [1][2]. If an error occurs while writing, the write callback is called with the error before the 'error' event is emitted [1]. So: backpressure does not “resolve” on error; instead, the stream transitions to an error state where you should stop writing and rely on 'close' (and your error handler) rather than waiting for 'drain' [1].

Citations:

1: https://nodejs.org/api/stream.html

2: https://nodejs.org/docs/latest-v19.x/api/stream.html

Unblock writeChunk on stream errors to prevent BullMQ worker hangs.

writeChunk only awaits 'drain' when stream.write(chunk) returns false. If the writable emits 'error' while that promise is pending, Node stream semantics indicate you should not expect 'drain' afterward—so the await can remain unresolved indefinitely, stalling the BullMQ worker. Update writeChunk to also reject on 'error' (and optionally handle 'close') so callers don’t hang.

🛡️ Proposed fix: reject on error

async function writeChunk(stream, chunk) { const canContinue = stream.write(chunk); if (!canContinue) { - await new Promise((resolve) => stream.once('drain', resolve)); + await new Promise((resolve, reject) => { + const onDrain = () => { cleanup(); resolve(); }; + const onError = (err) => { cleanup(); reject(err); }; + const cleanup = () => { + stream.removeListener('drain', onDrain); + stream.removeListener('error', onError); + }; + stream.once('drain', onDrain); + stream.once('error', onError); + }); } }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/consumer/src/workers/export.worker.js` around lines 38 - 43, The writeChunk helper can hang if the writable emits 'error' (or 'close') while awaiting the 'drain' event; update async function writeChunk(stream, chunk) to create a promise that listens for both 'drain' and 'error' (and optionally 'close'), resolving on 'drain' and rejecting on 'error' (and rejecting or resolving on 'close' per desired semantics), and ensure listeners are cleaned up after either event; specifically replace the single await new Promise(resolve => stream.once('drain', resolve)) with a promise that registers stream.once('drain', onDrain) and stream.once('error', onError) (and stream.once('close', onClose) if used), and calls stream.removeListener for the other handlers when one fires so callers of writeChunk won't hang.

yash-pouranik · 2026-06-02T18:38:34Z

some morree

yash-pouranik · 2026-06-05T08:35:49Z

@coderabbitai
Suggest a fix

coderabbitai · 2026-06-05T08:40:17Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat apps/consumer/src/workers/export.worker.js

Length of output: 9032

@yash-pouranik Here are the remaining issues I found in the current code along with concrete fixes:

Issue 1 — `fs.readFileSync` loads the entire export into memory (Supabase path, ~Line 131)

The worker writes to a temp file to handle backpressure, but then reads all of it back into a Buffer before uploading. For a 100 K-document export this can be hundreds of MB — a memory spike that undermines the whole temp-file strategy.

Fix: stream the file directly with fs.createReadStream:

// Before (line ~131):
const fileBuffer = fs.readFileSync(tempFilePath);
const { error } = await client.storage.from(bucket).upload(storagePath, fileBuffer, {
    contentType: 'application/json'
});

// After:
const readStream = fs.createReadStream(tempFilePath);
const { error } = await client.storage.from(bucket).upload(storagePath, readStream, {
    contentType: 'application/json'
});

Issue 2 — Mongoose cursor not explicitly closed after `break` (both paths)

When wasTruncated is set and we break out of the for await loop, the Mongoose cursor is abandoned. While the async-iterator protocol should call cursor.return() on break, an explicit close is the only safe guarantee against MongoDB connection leaks.

Fix: wrap each cursor loop in a try/finally (apply the same pattern to both the Supabase and S3/R2 paths):

const cursor = Model.find().lean().limit(MAX_EXPORT_ROWS + 1).cursor();
try {
    for await (const doc of cursor) {
        if (exportedCount >= MAX_EXPORT_ROWS) {
            wasTruncated = true;
            break;
        }
        exportedCount++;
        if (!first) await writeChunk(writeStream /* or passThrough */, ',\n');
        await writeChunk(writeStream /* or passThrough */, `    ${JSON.stringify(doc)}`);
        first = false;
    }
} finally {
    await cursor.close(); // guarantees the cursor is closed even if we break or throw
}

Issue 3 — Race condition: finish listener registered after `writeStream.end()` (Supabase path)

If writeStream.end() somehow emits 'finish' before the new Promise listener is attached, the promise will never resolve. Always register listeners before calling .end().

Fix:

// Before:
writeStream.end();
await new Promise((resolve, reject) => {
    writeStream.on('finish', resolve);
    writeStream.on('error', reject);
});

// After:
const finishPromise = new Promise((resolve, reject) => {
    writeStream.on('finish', resolve);
    writeStream.on('error', reject);
});
writeStream.end();
await finishPromise;

Issue 4 (Bonus) — Duplicated cursor-iteration logic

The entire for await loop is copy-pasted between the Supabase and S3/R2 paths. Extracting it into a helper (e.g. streamCollectionToWritable(cursor, writable)) would make both paths shorter and ensure a future fix only needs to be applied once.

Summary of priority:

#	Issue	Severity
1	`readFileSync` memory spike on Supabase path	🔴 High
2	Cursor not closed after `break` (both paths)	🔴 High
3	Finish-event race condition	🟡 Medium
4	Duplicated cursor loop	🟢 Low

🧠 Learnings used

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 188
File: apps/consumer/src/workers/export.worker.js:55-72
Timestamp: 2026-05-28T09:08:02.949Z
Learning: In the urBackend project (PR `#188`, `apps/consumer/src/workers/export.worker.js` and `apps/dashboard-api/src/controllers/dbExport.controller.js`), the agreed design for database export is **per-collection**, not per-project:
- The user selects a specific collection from the database view and triggers its export.
- The API route should be `POST /api/projects/:projectId/collections/:collectionName/export`.
- The BullMQ job payload must include `collectionName`.
- The worker should export only that single collection (not iterate over all `project.collections`).
- Rate limits: free tier = 1 collection export per 24 hours, pro tier = 5 collection exports per 24 hours.

yash-pouranik · 2026-06-05T08:43:29Z

what this pr does ?? @coderabbitai

coderabbitai · 2026-06-05T08:43:52Z

Tip

For best results, initiate chat on the files or code changes.

@yash-pouranik Here's a concise summary of what this PR does:

PR `#254` — Fix Resource Exhaustion in Export Worker

Problem (Issue #249):
When a user exported a very large MongoDB collection, the export worker had no document limit. This could exhaust CPU, memory, storage throughput, and MongoDB connections — potentially crashing or hanging BullMQ workers.

What the PR changes (apps/consumer/src/workers/export.worker.js):

Adds a row cap — Defines MAX_EXPORT_ROWS = 100,000 and applies .limit(MAX_EXPORT_ROWS + 1) to the MongoDB cursor in both the Supabase and S3/Cloudflare R2 export paths.
Sentinel-row detection — Fetching one extra document (+1) is used to detect truncation. If a 101,001st document arrives, the worker knows the collection exceeded the cap without falsely warning when the collection has exactly 100,000 docs.
Stream backpressure — All write() calls are wrapped in a writeChunk() helper that awaits the drain event when the write buffer is full, preventing unbounded memory growth when storage is slow.
Surfaces truncation to recipient — Adds wasTruncated and maxExportRows to the emailQueue payload so the email handler can notify the user that their export was capped.
Tracks export progress — The worker now tracks exportedCount and logs a warning when the limit is reached.

In short: before this PR, exporting a 10M-document collection would try to stream all 10M docs → crash or hang. After this PR, it exports at most 100,000 docs and tells the user their file was truncated.

The open issues I flagged in my previous comment (cursor not closed after break, readFileSync memory spike, finish-event race condition) still need to be addressed before this is production-ready.

🧠 Learnings used

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 188
File: apps/consumer/src/workers/export.worker.js:55-72
Timestamp: 2026-05-28T09:08:02.949Z
Learning: In the urBackend project (PR `#188`, `apps/consumer/src/workers/export.worker.js` and `apps/dashboard-api/src/controllers/dbExport.controller.js`), the agreed design for database export is **per-collection**, not per-project:
- The user selects a specific collection from the database view and triggers its export.
- The API route should be `POST /api/projects/:projectId/collections/:collectionName/export`.
- The BullMQ job payload must include `collectionName`.
- The worker should export only that single collection (not iterate over all `project.collections`).
- Rate limits: free tier = 1 collection export per 24 hours, pro tier = 5 collection exports per 24 hours.

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 0
File: :0-0
Timestamp: 2026-05-17T17:37:13.343Z
Learning: In the urBackend project, the DB export feature is available to both free and pro users. Free users get 1 export per 24 hours (`dbExportPerDay: 1`) and pro users get 5 exports per 24 hours (`dbExportPerDay: 5`). Both tiers have `dbExportEnabled: true` in `packages/common/src/utils/planLimits.js`. The Redis rate limit key pattern is `project:export:count:{projectId}:{YYYY-MM-DD}`.

yash-pouranik · 2026-06-05T08:49:16Z

but what if user wants export more rows?
We can create files of MAX_EXPORT_ROWS = 100,000
orsomething else?
@coderabbitai

coderabbitai · 2026-06-05T08:49:53Z

Tip

For best results, initiate chat on the files or code changes.

@yash-pouranik Great design question. Here are the realistic options, with tradeoffs:

Option A — Keep truncation (current approach)

How it works: Export stops at 100 K docs; user is told the file is capped.
Pro: Simple, predictable resource usage.
Con: User loses data they asked for.

Option B — Chunked multi-file export 🗂️ (your suggestion)

How it works: Worker splits the cursor output into chunk-001.json, chunk-002.json, … each capped at 100 K docs, then either:

Zips them into one archive and uploads the zip, or
Uploads each file separately and emails the user a list of signed URLs.

Implementation sketch:

const MAX_EXPORT_ROWS = 100_000;
let chunkIndex = 0, exportedCount = 0, chunkCount = 0;
const cursor = Model.find().lean().cursor();

try {
    for await (const doc of cursor) {
        if (exportedCount % MAX_EXPORT_ROWS === 0) {
            // close previous chunk, open a new one
            chunkIndex++;
            // finalize previous writeStream, start new one
        }
        await writeChunk(currentStream, JSON.stringify(doc));
        exportedCount++;
    }
} finally {
    await cursor.close();
}
// upload all chunks / zip them

Pro: User gets all their data.
Con: More storage consumed, more complex worker, harder to deliver (multiple signed URLs or a zip that must fit in memory).

Option C — Single streamed upload (no temp file, no limit)

Stream the Mongo cursor directly into the storage upload without buffering to disk. Requires the storage SDK to accept a readable stream (Supabase's JS SDK does support it; R2 via AWS SDK does too with Body: stream).
Pro: Handles arbitrarily large collections with O(1) memory.
Con: Requires the storage SDK streaming path to be reliable; long-lived connections to MongoDB and object storage simultaneously.

Recommended path

Concern	Recommendation
Keep it simple for now	Option A (current PR) — truncate + notify is good enough for a v1
User needs all data	Option B — chunked files zipped into one archive is the most user-friendly delivery
Long-term scalability	Option C — stream directly to object storage with no in-memory buffer

If you want to go with Option B, I'd suggest:

Upload each chunk as export-{collectionName}-part-{n}.json to object storage.
Zip them server-side (using Node's archiver package) into a single export-{collectionName}.zip.
Email one signed URL for the zip.

That gives users all their data without multiple download links. Want me to plan out the code changes for Option B?

🧠 Learnings used

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 188
File: apps/consumer/src/workers/export.worker.js:55-72
Timestamp: 2026-05-28T09:08:02.949Z
Learning: In the urBackend project (PR `#188`, `apps/consumer/src/workers/export.worker.js` and `apps/dashboard-api/src/controllers/dbExport.controller.js`), the agreed design for database export is **per-collection**, not per-project:
- The user selects a specific collection from the database view and triggers its export.
- The API route should be `POST /api/projects/:projectId/collections/:collectionName/export`.
- The BullMQ job payload must include `collectionName`.
- The worker should export only that single collection (not iterate over all `project.collections`).
- Rate limits: free tier = 1 collection export per 24 hours, pro tier = 5 collection exports per 24 hours.

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 0
File: :0-0
Timestamp: 2026-05-17T17:37:13.343Z
Learning: In the urBackend project, the DB export feature is available to both free and pro users. Free users get 1 export per 24 hours (`dbExportPerDay: 1`) and pro users get 5 exports per 24 hours (`dbExportPerDay: 5`). Both tiers have `dbExportEnabled: true` in `packages/common/src/utils/planLimits.js`. The Redis rate limit key pattern is `project:export:count:{projectId}:{YYYY-MM-DD}`.

yash-pouranik · 2026-06-05T08:59:05Z

supabase doesnt support true streaming. @coderabbitai

coderabbitai · 2026-06-05T09:00:08Z

Tip

For best results, initiate chat on the files or code changes.

@yash-pouranik Good clarification — that's an important constraint. Here's the updated picture:

What "no true streaming" means in practice:
Supabase's storage service buffers the entire upload server-side before writing it to object storage. You can't pipe a never-ending stream directly into it and have it write incrementally. So Option C is off the table for the Supabase path.

However — the readFileSync issue still applies:

Even though Supabase buffers on the server side, the Supabase JS SDK (@supabase/storage-js) does accept a NodeJS.ReadableStream as the upload body. So you can still avoid loading the entire temp file into a Buffer in the Node process:

// ❌ Current — loads entire file into memory on the Node side
const fileBuffer = fs.readFileSync(tempFilePath);
await client.storage.from(bucket).upload(storagePath, fileBuffer, { contentType: 'application/json' });

// ✅ Better — SDK sends the stream over HTTP, Supabase buffers on their end
const readStream = fs.createReadStream(tempFilePath);
await client.storage.from(bucket).upload(storagePath, readStream, { contentType: 'application/json' });

The memory pressure moves from your Node process to Supabase's infrastructure — which is exactly where it belongs.

Revised options given this constraint:

Option	Supabase path	R2/S3 path	Notes
A — Truncate at 100 K	✅ Works	✅ Works	Current PR approach
B — Chunked zip	✅ Works (upload each chunk file)	✅ Works	Gives user all data
C — Direct streaming	❌ Not supported	✅ Works via multipart upload	Hybrid approach possible

For Option B on the Supabase path: write each MAX_EXPORT_ROWS-sized chunk to a temp file → upload it → delete temp file → repeat. The user gets a zip with all their data, and your Node process never holds more than one chunk in memory.

Would you like me to plan out the code changes for Option B, keeping the Supabase constraint in mind?

✏️ Learnings added

Learnt from: yash-pouranik
URL: https://github.com/geturbackend/urBackend/pull/254

Timestamp: 2026-06-05T09:00:07.517Z
Learning: In the urBackend project, Supabase storage does NOT support true streaming uploads — it buffers the entire upload server-side. The Supabase JS SDK (`supabase/storage-js`) does accept a `NodeJS.ReadableStream` as the upload body (avoiding `readFileSync` memory spike on the client), but the server will still buffer the full payload. Option C (streaming directly to object storage with no buffering) is only viable for the S3/Cloudflare R2 path via AWS SDK multipart upload, not for the Supabase path.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 188
File: apps/consumer/src/workers/export.worker.js:55-72
Timestamp: 2026-05-28T09:08:02.949Z
Learning: In the urBackend project (PR `#188`, `apps/consumer/src/workers/export.worker.js` and `apps/dashboard-api/src/controllers/dbExport.controller.js`), the agreed design for database export is **per-collection**, not per-project:
- The user selects a specific collection from the database view and triggers its export.
- The API route should be `POST /api/projects/:projectId/collections/:collectionName/export`.
- The BullMQ job payload must include `collectionName`.
- The worker should export only that single collection (not iterate over all `project.collections`).
- Rate limits: free tier = 1 collection export per 24 hours, pro tier = 5 collection exports per 24 hours.

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 0
File: :0-0
Timestamp: 2026-05-17T17:37:13.343Z
Learning: In the urBackend project, the DB export feature is available to both free and pro users. Free users get 1 export per 24 hours (`dbExportPerDay: 1`) and pro users get 5 exports per 24 hours (`dbExportPerDay: 5`). Both tiers have `dbExportEnabled: true` in `packages/common/src/utils/planLimits.js`. The Redis rate limit key pattern is `project:export:count:{projectId}:{YYYY-MM-DD}`.

yash-pouranik · 2026-06-05T09:06:36Z

so this can only be done by supabase anon key?
also I think this can be an another issue
@coderabbitai
please list all the issues in this PR, till this scope only - MAX_EXPORT_ROWS = 100,000
and @anshul23102
please fix this

coderabbitai · 2026-06-05T09:10:46Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat apps/consumer/src/workers/export.worker.js

Length of output: 9032

@yash-pouranik Here's the full breakdown:

On the "Supabase anon key" question

The choice of key (anon vs service_role) is independent of whether you pass a Buffer or a ReadableStream to client.storage.from(bucket).upload(...). Both key types support both upload bodies. The key type only affects Row Level Security (RLS) policies on the bucket — not the upload encoding. So switching from readFileSync (Buffer) to createReadStream (ReadableStream) works regardless of which key your getStorage() client is initialized with.

All open issues scoped to `MAX_EXPORT_ROWS = 100,000`

#	Severity	Location	Issue
1	🔴 High	Supabase path, ~Line 131	`fs.readFileSync` loads the entire export into a Node.js `Buffer` before uploading — negates the whole temp-file strategy
2	🔴 High	Both paths, ~Lines 94–110 & 148–163	Mongoose cursor is not explicitly closed after `break`; must use `try/finally { await cursor.close() }` to prevent MongoDB connection leaks
3	🟡 Medium	Supabase path, ~Lines 118–123	Race condition: `writeStream.end()` is called before the `finish` promise listener is registered — if `finish` fires first, the promise never resolves
4	🟢 Low	Both paths	Entire `for await` cursor loop is copy-pasted between the two provider paths — a single bug fix must be applied twice

Concrete fixes for `@anshul23102`

Fix 1 — Replace readFileSync with createReadStream (Supabase path, ~Line 131):

// Before:
const fileBuffer = fs.readFileSync(tempFilePath);
const { error } = await client.storage.from(bucket).upload(storagePath, fileBuffer, {
    contentType: 'application/json'
});

// After:
const readStream = fs.createReadStream(tempFilePath);
const { error } = await client.storage.from(bucket).upload(storagePath, readStream, {
    contentType: 'application/json'
});

Fix 2 — Explicitly close the cursor in both paths:

const cursor = Model.find().lean().limit(MAX_EXPORT_ROWS + 1).cursor();
try {
    for await (const doc of cursor) {
        if (exportedCount >= MAX_EXPORT_ROWS) {
            wasTruncated = true;
            break;
        }
        // ... write doc ...
    }
} finally {
    await cursor.close(); // safe even if already exhausted
}

Apply this to both the Supabase path and the S3/R2 path.

Fix 3 — Register the finish listener before calling .end() (Supabase path):

// Before:
writeStream.end();
await new Promise((resolve, reject) => {
    writeStream.on('finish', resolve);
    writeStream.on('error', reject);
});

// After:
const finishPromise = new Promise((resolve, reject) => {
    writeStream.on('finish', resolve);
    writeStream.on('error', reject);
});
writeStream.end();
await finishPromise;

Fix 4 (optional, recommended) — Extract shared cursor-streaming helper:

async function streamCursorToWritable(cursor, writable, maxRows) {
    let first = true;
    let exportedCount = 0;
    let wasTruncated = false;
    try {
        for await (const doc of cursor) {
            if (exportedCount >= maxRows) { wasTruncated = true; break; }
            exportedCount++;
            if (!first) await writeChunk(writable, ',\n');
            await writeChunk(writable, `    ${JSON.stringify(doc)}`);
            first = false;
        }
    } finally {
        await cursor.close();
    }
    return { exportedCount, wasTruncated };
}

Then call streamCursorToWritable(cursor, writeStream, MAX_EXPORT_ROWS) in both paths. This ensures any future fix only needs to be applied once.

🧠 Learnings used

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 188
File: apps/consumer/src/workers/export.worker.js:55-72
Timestamp: 2026-05-28T09:08:02.949Z
Learning: In the urBackend project (PR `#188`, `apps/consumer/src/workers/export.worker.js` and `apps/dashboard-api/src/controllers/dbExport.controller.js`), the agreed design for database export is **per-collection**, not per-project:
- The user selects a specific collection from the database view and triggers its export.
- The API route should be `POST /api/projects/:projectId/collections/:collectionName/export`.
- The BullMQ job payload must include `collectionName`.
- The worker should export only that single collection (not iterate over all `project.collections`).
- Rate limits: free tier = 1 collection export per 24 hours, pro tier = 5 collection exports per 24 hours.

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 0
File: :0-0
Timestamp: 2026-05-17T17:37:13.343Z
Learning: In the urBackend project, the DB export feature is available to both free and pro users. Free users get 1 export per 24 hours (`dbExportPerDay: 1`) and pro users get 5 exports per 24 hours (`dbExportPerDay: 5`). Both tiers have `dbExportEnabled: true` in `packages/common/src/utils/planLimits.js`. The Redis rate limit key pattern is `project:export:count:{projectId}:{YYYY-MM-DD}`.

Learnt from: yash-pouranik
Repo: geturbackend/urBackend PR: 0
File: :0-0
Timestamp: 2026-06-05T09:00:07.517Z
Learning: In the urBackend project, Supabase storage does NOT support true streaming uploads — it buffers the entire upload server-side. The Supabase JS SDK (`supabase/storage-js`) does accept a `NodeJS.ReadableStream` as the upload body (avoiding `readFileSync` memory spike on the client), but the server will still buffer the full payload. Option C (streaming directly to object storage with no buffering) is only viable for the S3/Cloudflare R2 path via AWS SDK multipart upload, not for the Supabase path.

yash-pouranik · 2026-06-05T09:52:05Z

@anshul23102

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread apps/consumer/src/workers/export.worker.js Outdated

Comment thread apps/consumer/src/workers/export.worker.js

Comment thread apps/consumer/src/workers/export.worker.js Outdated

anshul23102 closed this Jun 2, 2026

anshul23102 reopened this Jun 2, 2026

yash-pouranik added nsoc26 GSSOC'26 labels Jun 2, 2026

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

yash-pouranik added the backend label Jun 2, 2026

anshul23102 mentioned this pull request Jun 2, 2026

[Bug] consumer export worker streams entire collection with no document count cap, enabling resource exhaustion on collections with millions of documents #249

Open

Conversation

anshul23102 commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Statement

Root Cause Analysis

Solution Overview

Changes Made

apps/consumer/src/workers/export.worker.js

Type of Change

Testing Done

Environment

Manual Testing Steps

Edge Cases Tested

Screenshots or Demo

Related Issue

Checklist

GSSoC Label Request

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anshul23102 commented Jun 2, 2026

Uh oh!

yash-pouranik commented Jun 2, 2026

Uh oh!

yash-pouranik commented Jun 2, 2026

Uh oh!

yash-pouranik commented Jun 2, 2026

Uh oh!

anshul23102 commented Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

yash-pouranik commented Jun 2, 2026

Uh oh!

yash-pouranik commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 5, 2026

Issue 1 — fs.readFileSync loads the entire export into memory (Supabase path, ~Line 131)

Issue 2 — Mongoose cursor not explicitly closed after break (both paths)

Issue 3 — Race condition: finish listener registered after writeStream.end() (Supabase path)

Issue 4 (Bonus) — Duplicated cursor-iteration logic

Uh oh!

yash-pouranik commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026

PR #254 — Fix Resource Exhaustion in Export Worker

Uh oh!

yash-pouranik commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026

Option A — Keep truncation (current approach)

Option B — Chunked multi-file export 🗂️ (your suggestion)

Option C — Single streamed upload (no temp file, no limit)

Recommended path

Uh oh!

yash-pouranik commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026

Uh oh!

yash-pouranik commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026

anshul23102 commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

yash-pouranik commented Jun 5, 2026 •

edited

Loading

Issue 1 — `fs.readFileSync` loads the entire export into memory (Supabase path, ~Line 131)

Issue 2 — Mongoose cursor not explicitly closed after `break` (both paths)

Issue 3 — Race condition: finish listener registered after `writeStream.end()` (Supabase path)

PR `#254` — Fix Resource Exhaustion in Export Worker

All open issues scoped to `MAX_EXPORT_ROWS = 100,000`

Concrete fixes for `@anshul23102`