test(datasets): add seed script for generating mock datasets#2728
Open
Junjiequan wants to merge 2 commits into
Open
test(datasets): add seed script for generating mock datasets#2728Junjiequan wants to merge 2 commits into
Junjiequan wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider parameterizing
MONGO_URI,DB_NAME,COLLECTION, and possiblyTOTALvia environment variables or CLI flags instead of hard-coding them so the script can be reused across different environments without edits. - The computation of
TOTAL_GROUPSusesrandomInt(5, 10)at module load time, which makes the number of groups vary between runs; if you want reproducible seeding or easier reasoning about group distribution, derive this from a fixed value or make it configurable. - Wrap the Mongo client lifecycle in a
try/finally(or similar) to ensureclient.close()is always called even if an error occurs during batch inserts or sampling.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider parameterizing `MONGO_URI`, `DB_NAME`, `COLLECTION`, and possibly `TOTAL` via environment variables or CLI flags instead of hard-coding them so the script can be reused across different environments without edits.
- The computation of `TOTAL_GROUPS` uses `randomInt(5, 10)` at module load time, which makes the number of groups vary between runs; if you want reproducible seeding or easier reasoning about group distribution, derive this from a fixed value or make it configurable.
- Wrap the Mongo client lifecycle in a `try/finally` (or similar) to ensure `client.close()` is always called even if an error occurs during batch inserts or sampling.
## Individual Comments
### Comment 1
<location path="scripts/seed/seed-datasets.js" line_range="108-111" />
<code_context>
+ const humanName = Math.random() < 0.5 && {
+ human_name: randomString(randomInt(5, 15)),
+ };
+ metadata[key] = {
+ value: randomInt(1, 999999),
+ unit: randomItem(UNITS),
+ ...(humanName !== false && humanName),
+ };
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** The spread of `humanName !== false && humanName` can evaluate to `...false`, which will throw at runtime.
In this branch `humanName` is either an object or `false`, so `humanName !== false && humanName` evaluates to that object or `false`. Using it in `...(humanName !== false && humanName)` can therefore try to spread `false`. Instead, build an object and spread that, for example:
```js
const humanName = Math.random() < 0.5
? { human_name: randomString(randomInt(5, 15)) }
: {};
metadata[key] = {
value: randomInt(1, 999999),
unit: randomItem(UNITS),
...humanName,
};
```
or keep the short-circuit style but normalize to an object:
```js
const humanName = Math.random() < 0.5 && {
human_name: randomString(randomInt(5, 15)),
};
metadata[key] = {
value: randomInt(1, 999999),
unit: randomItem(UNITS),
...(humanName || {}),
};
```
The shared-key branch should be adjusted in the same way.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
First draft seed script for local development that randomly generates
mock datasets with random scientific metadata for query performance testing.
Motivation
Fixes
Changes:
Tests included
Documentation
official documentation info
Summary by Sourcery
Add a MongoDB seed script for generating large volumes of mock datasets with realistic scientific metadata for local performance testing.
New Features:
Documentation: