Fine-tuning codellama with own dataset for code review

Hi :)
I am planning to fine tune Code LLaMA 7B using my own dataset to evaluate its effectiveness for code review tasks. I came across a paper where the authors attempted a similar approach using data from CodeReviewer  (https://arxiv.org/abs/2203.09095) .

My plan is to gather data from the pull requests in our repositories and create JSON files in the following format:
{
  "oldf": "... contents of another old file ...",
  "patch": "@@ -25,13 +25,16 @@ ...",   // Diffs from the repo
  "msg": "we call cities + towns ...",
  "id": 12959,
  "y": 1
}
Will this data format work for fine tuning Code LLaMA, or will I need to adjust the dataset to a specific format for it to be compatible?
Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuning codellama with own dataset for code review #257

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine-tuning codellama with own dataset for code review #257

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions