Worked on issue 33 : Extract and store ML features for each finding by Krish-Mishra · Pull Request #41 · ionfwsrijan/PatchPilot

Krish-Mishra · 2026-06-05T13:03:12Z

Linked issue

Closes #33

What this PR does

This PR is created in order to resolve issue 33 which is Extract and store ML features for each finding.
here, Added a features column (JSON blob) to the findings table and populated with the required data

It meets all the acceptance criteria mentioned in the issue.

Type of change

ML tier (if applicable)

[✅] Tier 1 — Triage
Tier 2 — Predictive
Tier 3 — Autonomous
Not ML-related

Changes

Backend

Added a centralized extract_features utility in app/utils/ml_features.py to calculate 7 structured ML features per finding (cwe_category, file_extension, path_depth, scanner, raw_severity, is_test_file, rule_id_prefix).
Injected the feature extraction logic into the data ingestion pipelines for gitleaks.py, osv.py, and semgrep.py prior to model validation.
Updated the Finding Pydantic model in app/models.py to include an optional features dictionary, which automatically exposes the extracted data to the /jobs/{job_id}/findings API endpoint payload.

Frontend

New dependencies

Database / schema changes

Modified the Finding Pydantic schema in app/models.py to include features: Optional[Dict[str, Any]] = Field(default_factory=dict) to store the JSON blob at insert time without breaking backward compatibility for older records.

Testing

How did you test this?

Executed the existing automated test suite (pytest tests/) to verify that the API response serialization remains intact with the new schema update.
Conducted a local end-to-end test by spinning up the backend (uvicorn) and running dummy scans to verify the features object populates correctly in the JSON response of the /jobs/{job_id}/findings endpoint.

Checklist

[✅] Tested locally end-to-end (upload ZIP or GitHub URL → scan → findings returned correctly)
[✅] New ML model falls back gracefully when model file is absent
[✅] No new console.error or unhandled Python exceptions introduced
[✅] Added or updated tests where applicable
[✅] requirements.txt / package.json updated if new dependencies added
[✅] New model files (.pkl, .pt, etc.) are gitignored, not committed

Anything reviewers should focus on

Screenshots (if UI changed)

Tushar-sonawane06 · 2026-06-05T14:03:58Z

@ionfwsrijan PR #41 is ready to merge! All changes from issue #33 have been implemented and tested locally. The feature extraction system is working as expected with all 7 ML features being populated correctly. Just need your approval to trigger the workflow and complete the merge.

ionfwsrijan · 2026-06-05T14:24:19Z

@Krish-Mishra Pls ruff format the code to fix these failing checks.

Krish-Mishra · 2026-06-05T14:52:52Z

@Krish-Mishra Pls ruff format the code to fix these failing checks.

ok will do

Krish-Mishra · 2026-06-05T14:58:37Z

@Krish-Mishra Pls ruff format the code to fix these failing checks.

Done, now please check

Thank You

ionfwsrijan · 2026-06-05T15:38:07Z

@Krish-Mishra Pls ruff format the code to fix these failing checks.

Done, now please check

Thank You

LGTM merging it now

Worked on issue 33 : Extract and store ML features for each finding

8bae3cd

fixed formatting issues

30a92a3

ionfwsrijan merged commit 43203dd into ionfwsrijan:main Jun 5, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worked on issue 33 : Extract and store ML features for each finding#41

Worked on issue 33 : Extract and store ML features for each finding#41
ionfwsrijan merged 2 commits into
ionfwsrijan:mainfrom
Krish-Mishra:main

Krish-Mishra commented Jun 5, 2026

Uh oh!

Tushar-sonawane06 commented Jun 5, 2026

Uh oh!

ionfwsrijan commented Jun 5, 2026

Uh oh!

Krish-Mishra commented Jun 5, 2026

Uh oh!

Krish-Mishra commented Jun 5, 2026

Uh oh!

ionfwsrijan commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Krish-Mishra commented Jun 5, 2026

Linked issue

What this PR does

Type of change

ML tier (if applicable)

Changes

Backend

Frontend

New dependencies

Database / schema changes

Testing

Anything reviewers should focus on

Screenshots (if UI changed)

Uh oh!

Tushar-sonawane06 commented Jun 5, 2026

Uh oh!

ionfwsrijan commented Jun 5, 2026

Uh oh!

Krish-Mishra commented Jun 5, 2026

Uh oh!

Krish-Mishra commented Jun 5, 2026

Uh oh!

ionfwsrijan commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants