feat: add GEditBench benchmark with task type subsets

## Summary
Add GEditBench benchmark for image editing evaluation with 11 task type subsets.

## Dataset
- **Source:** `stepfun-ai/GEdit-Bench` (HuggingFace)
- **Task types:** background_change, color_alter, material_alter, motion_change, ps_human, style_change, subject_add, subject_remove, subject_replace, text_change, tone_transfer
- **Collate:** `prompt_with_auxiliaries_collate`

## Implementation
- Add `setup_gedit_dataset` in `src/pruna/data/datasets/prompt.py`
- Support `subset` param for filtering task types
- Filter to English instructions only
- Register in `base_datasets`
- Add `BenchmarkInfo` entry with metrics: `["accuracy"]`, subsets list
- Auxiliaries should include `image` (input_image), `subset`
- Add test

## Acceptance
- `PrunaDataModule.from_string("GEditBench")` works (all subsets)
- `PrunaDataModule.from_string("GEditBench", subset="background_change")` works
- Auxiliaries include `image`, `subset` fields
- Test passes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GEditBench benchmark with task type subsets #511

Summary

Dataset

Implementation

Acceptance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: add GEditBench benchmark with task type subsets #511

Description

Summary

Dataset

Implementation

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions