Skip to content

feat(ml): implement severity ranker training script#44

Open
anahaaaa wants to merge 1 commit into
ionfwsrijan:mainfrom
anahaaaa:severity-ranker-model
Open

feat(ml): implement severity ranker training script#44
anahaaaa wants to merge 1 commit into
ionfwsrijan:mainfrom
anahaaaa:severity-ranker-model

Conversation

@anahaaaa
Copy link
Copy Markdown
Contributor

@anahaaaa anahaaaa commented Jun 6, 2026

Linked issue

Closes #34

What this PR does

Implements the severity ranker training pipeline for PatchPilot.

This PR adds a training script that loads findings from SQLite, extracts or reconstructs ML features, trains a GradientBoostingClassifier, evaluates it using a classification report, and saves the trained model as ranker.pkl for later use by the ranking system.

Type of change

  • Bug fix
  • New feature
  • ML model / training pipeline
  • Refactor (no behaviour change)
  • Documentation
  • Tests only

ML tier (if applicable)

  • Tier 1 — Triage
  • Tier 2 — Predictive
  • Tier 3 — Autonomous
  • Not ML-related

Changes

Backend

  • Added backend/scripts/train_ranker.py
  • Loads findings from SQLite
  • Uses persisted features when available
  • Falls back to reconstructing features via extract_features()
  • Maps severity labels to numeric classes
  • Encodes categorical features using OrdinalEncoder
  • Trains a GradientBoostingClassifier
  • Prints classification metrics after training
  • Saves the trained model using joblib

Frontend

  • None

New dependencies

  • None

Database / schema changes

  • None

Testing

How did you test this?

  • Trained the ranker on a local SQLite database containing more than 50 findings collected from multiple scans.
  • Verified that the script successfully generates app/ml/models/ranker.pkl.
  • Verified that the saved model can be loaded successfully using joblib.load().
  • Verified that the --help flag displays usage instructions.
  • Verified that feature reconstruction works when no persisted features column is present.

Checklist

  • Tested locally end-to-end (training pipeline execution completed successfully)
  • New ML model falls back gracefully when model file is absent
  • No new console.error or unhandled Python exceptions introduced
  • Added or updated tests where applicable
  • requirements.txt / package.json updated if new dependencies added
  • New model files (.pkl, .pt, etc.) are gitignored, not committed

Anything reviewers should focus on

The training script supports both current and future database schemas:

  • If a persisted features column exists, it is used directly.
  • If it does not exist, features are reconstructed using the existing extract_features() utility, which matches the feature specification defined for the ranker pipeline.

This approach keeps the implementation compatible with both pre- and post-feature-persistence versions of the findings schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build and train the severity ranker model

1 participant