feat(ml): implement severity ranker training script by anahaaaa · Pull Request #44 · ionfwsrijan/PatchPilot

anahaaaa · 2026-06-06T11:18:39Z

Linked issue

Closes #34

What this PR does

Implements the severity ranker training pipeline for PatchPilot.

This PR adds a training script that loads findings from SQLite, extracts or reconstructs ML features, trains a GradientBoostingClassifier, evaluates it using a classification report, and saves the trained model as ranker.pkl for later use by the ranking system.

Type of change

ML tier (if applicable)

Tier 1 — Triage
Tier 2 — Predictive
Tier 3 — Autonomous
Not ML-related

Changes

Backend

Added backend/scripts/train_ranker.py
Loads findings from SQLite
Uses persisted features when available
Falls back to reconstructing features via extract_features()
Maps severity labels to numeric classes
Encodes categorical features using OrdinalEncoder
Trains a GradientBoostingClassifier
Prints classification metrics after training
Saves the trained model using joblib

Frontend

None

New dependencies

None

Database / schema changes

None

Testing

How did you test this?

Trained the ranker on a local SQLite database containing more than 50 findings collected from multiple scans.
Verified that the script successfully generates app/ml/models/ranker.pkl.
Verified that the saved model can be loaded successfully using joblib.load().
Verified that the --help flag displays usage instructions.
Verified that feature reconstruction works when no persisted features column is present.

Checklist

Tested locally end-to-end (training pipeline execution completed successfully)
New ML model falls back gracefully when model file is absent
No new console.error or unhandled Python exceptions introduced
Added or updated tests where applicable
requirements.txt / package.json updated if new dependencies added
New model files (.pkl, .pt, etc.) are gitignored, not committed

Anything reviewers should focus on

The training script supports both current and future database schemas:

If a persisted features column exists, it is used directly.
If it does not exist, features are reconstructed using the existing extract_features() utility, which matches the feature specification defined for the ranker pipeline.

This approach keeps the implementation compatible with both pre- and post-feature-persistence versions of the findings schema.

feat(ml): implement severity ranker training script

0617d7c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ml): implement severity ranker training script#44

feat(ml): implement severity ranker training script#44
anahaaaa wants to merge 1 commit into
ionfwsrijan:mainfrom
anahaaaa:severity-ranker-model

anahaaaa commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anahaaaa commented Jun 6, 2026

Linked issue

What this PR does

Type of change

ML tier (if applicable)

Changes

Backend

Frontend

New dependencies

Database / schema changes

Testing

How did you test this?

Checklist

Anything reviewers should focus on

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant