-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Add GenEval benchmark with 6 subcategories for compositional evaluation.
Details
- Source: GitHub JSON from
djghosh13/genevalrepo - Subcategories: single_object, two_object, counting, colors, position, color_attr
- Collate:
prompt_with_auxiliaries_collate
Implementation
- Add
setup_geneval_datasetinsrc/pruna/data/datasets/prompt.py - Support
categoryparam for filtering subcategories - Register in
base_datasets - Add
BenchmarkInfoentry with metrics:["qa_accuracy"], subsets list - Auxiliaries should include
questionslist andtagfor evaluation - Add test
Acceptance Criteria
-
PrunaDataModule.from_string("GenEval")works (all subcategories) -
PrunaDataModule.from_string("GenEval", category="counting")works - Auxiliaries include
questionsandtagfields - Test passes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request