Worked on issue 33 : Extract and store ML features for each finding#41
Merged
Conversation
|
@ionfwsrijan PR #41 is ready to merge! All changes from issue #33 have been implemented and tested locally. The feature extraction system is working as expected with all 7 ML features being populated correctly. Just need your approval to trigger the workflow and complete the merge. |
Owner
|
@Krish-Mishra Pls ruff format the code to fix these failing checks. |
Contributor
Author
ok will do |
Contributor
Author
Done, now please check Thank You |
Owner
LGTM merging it now |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Linked issue
Closes #33
What this PR does
This PR is created in order to resolve issue 33 which is Extract and store ML features for each finding.
here, Added a features column (JSON blob) to the findings table and populated with the required data
It meets all the acceptance criteria mentioned in the issue.
Type of change
ML tier (if applicable)
Changes
Backend
extract_featuresutility inapp/utils/ml_features.pyto calculate 7 structured ML features per finding (cwe_category,file_extension,path_depth,scanner,raw_severity,is_test_file,rule_id_prefix).gitleaks.py,osv.py, andsemgrep.pyprior to model validation.FindingPydantic model inapp/models.pyto include an optionalfeaturesdictionary, which automatically exposes the extracted data to the/jobs/{job_id}/findingsAPI endpoint payload.Frontend
New dependencies
Database / schema changes
FindingPydantic schema inapp/models.pyto includefeatures: Optional[Dict[str, Any]] = Field(default_factory=dict)to store the JSON blob at insert time without breaking backward compatibility for older records.Testing
How did you test this?
pytest tests/) to verify that the API response serialization remains intact with the new schema update.uvicorn) and running dummy scans to verify thefeaturesobject populates correctly in the JSON response of the/jobs/{job_id}/findingsendpoint.Checklist
console.erroror unhandled Python exceptions introducedrequirements.txt/package.jsonupdated if new dependencies added.pkl,.pt, etc.) are gitignored, not committedAnything reviewers should focus on
Screenshots (if UI changed)