Skip to content

refactor: simplify job validation and reduce db calls#2714

Open
minottic wants to merge 1 commit into
job_authfrom
job_valid
Open

refactor: simplify job validation and reduce db calls#2714
minottic wants to merge 1 commit into
job_authfrom
job_valid

Conversation

@minottic
Copy link
Copy Markdown
Member

@minottic minottic commented May 4, 2026

Description

Refactor job validation by reducing DB calls and using class-transformer validation. Also use more specific error codes.

Depends on #2713

Tests included

  • Included for each change/fix?
  • Passing?

Documentation

  • swagger documentation updated (required for API changes)
  • official documentation updated

official documentation info

Summary by Sourcery

Refactor job validation logic to rely on DTO validation and more specific HTTP errors while reducing database usage for dataset and file checks.

Bug Fixes:

  • Ensure dataset and file validation errors return HTTP 422 Unprocessable Entity instead of generic bad request responses.

Enhancements:

  • Leverage class-transformer and class-validator to validate dataset list DTOs instead of manual checks.
  • Optimize dataset PID and file existence checks by reducing and consolidating database queries and projecting only required fields.

Tests:

  • Update job-related integration tests to expect HTTP 422 status codes for invalid job requests.

@minottic minottic requested a review from a team as a code owner May 4, 2026 12:55
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The UnprocessableEntityException responses now mix plain string messages and { message, error } objects (e.g., in dataset list validation vs checkDatasetFiles), which can make the API error shape inconsistent for clients; consider standardizing the payload format for these validation errors.
  • In checkDatasetFiles, origsMappedByDatasetId is keyed by orig.datasetId but later accessed using pid (origsMappedByDatasetId[pid]), which assumes datasetId and pid are identical; if they differ in your schema, this lookup will silently fail and should be adjusted or made explicit.
  • The dataset list validation now returns JSON.stringify(validateErrors) directly in the error message, which can be verbose and leak internal structure; consider mapping the class-validator errors to a more concise, user-focused error format before including them in the response.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `UnprocessableEntityException` responses now mix plain string messages and `{ message, error }` objects (e.g., in dataset list validation vs `checkDatasetFiles`), which can make the API error shape inconsistent for clients; consider standardizing the payload format for these validation errors.
- In `checkDatasetFiles`, `origsMappedByDatasetId` is keyed by `orig.datasetId` but later accessed using `pid` (`origsMappedByDatasetId[pid]`), which assumes `datasetId` and `pid` are identical; if they differ in your schema, this lookup will silently fail and should be adjusted or made explicit.
- The dataset list validation now returns `JSON.stringify(validateErrors)` directly in the error message, which can be verbose and leak internal structure; consider mapping the `class-validator` errors to a more concise, user-focused error format before including them in the response.

## Individual Comments

### Comment 1
<location path="src/jobs/jobs.controller.utils.ts" line_range="98-99" />
<code_context>
+      DatasetListDto,
+      datasetList,
+    );
+    const nestedErrors = await Promise.all(
+      datasetListDtos.map((dto) => validate(dto)),
+    );
+    const validateErrors = nestedErrors.flat();
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Avoid serializing full validation errors into the response to prevent overly verbose and potentially sensitive output.

`ValidationError` instances can include the full input (`target`) and other metadata, so serializing them directly into the exception message makes responses noisy and may leak user data in HTTP responses/logs. Instead, derive a minimal structure from `validateErrors` (e.g. `{ property, constraints }`) or rely on Nest’s validation pipe to surface only high-level messages, rather than `JSON.stringify(validateErrors)`.

Suggested implementation:

```typescript
    const datasetListDtos: DatasetListDto[] = plainToInstance(
      DatasetListDto,
      datasetList,
    );
    const nestedErrors = await Promise.all(
      datasetListDtos.map((dto) => validate(dto)),
    );

    const validateErrors = nestedErrors.flat();
    if (validateErrors.length > 0) {
      const minimalErrors = validateErrors.map(({ property, constraints }) => ({
        property,
        constraints,
      }));

      throw new UnprocessableEntityException({
        message: 'Invalid dataset list.',
        errors: minimalErrors,
      });
    }

```

If this endpoint is already wrapped by a global validation pipe, consider whether you still need this manual validation logic. If you do, you might want to align the error shape (`message`/`errors`) with the rest of your API error format (e.g. add an application-specific `code` or `statusCode` field) so responses are consistent.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/jobs/jobs.controller.utils.ts
@minottic minottic force-pushed the job_auth branch 2 times, most recently from 09017bb to af64aed Compare May 8, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant