feat: add LongTextBench benchmark by davidberenstein1957 · Pull Request #506 · PrunaAI/pruna

davidberenstein1957 · 2026-01-31T15:09:11Z

Closes #513

Summary

Add LongTextBench for evaluating text-to-image with long, complex prompts
Uses X-Omni/LongText-Bench dataset from HuggingFace
Preserves text_content field in auxiliaries for text_score evaluation

Changes

Add setup_long_text_bench_dataset in src/pruna/data/datasets/prompt.py
Register in base_datasets in src/pruna/data/__init__.py
Add BenchmarkInfo with metrics: ["text_score"]
Add tests for basic loading and auxiliaries

Test plan

test_long_text_bench_auxiliaries passes
test_dm_from_string[LongTextBench-...] passes

…mpts benchmark - Introduced `from_benchmark` method in `PrunaDataModule` to create instances from benchmark classes. - Added `Benchmark`, `BenchmarkEntry`, and `BenchmarkRegistry` classes for managing benchmarks. - Implemented `PartiPrompts` benchmark for text-to-image generation with various categories and challenges. - Created utility function `benchmark_to_datasets` to convert benchmarks into datasets compatible with `PrunaDataModule`. - Added integration tests for benchmark functionality and data module interactions.

…filtering - Remove heavy benchmark abstraction (Benchmark class, registry, adapter, 24 subclasses) - Extend setup_parti_prompts_dataset with category and num_samples params - Add BenchmarkInfo dataclass for metadata (metrics, description, subsets) - Switch PartiPrompts to prompt_with_auxiliaries_collate to preserve Category/Challenge - Merge tests into test_datamodule.py Reduces 964 lines to 128 lines (87% reduction) Co-authored-by: Cursor <cursoragent@cursor.com>

Add LongTextBench for evaluating text-to-image with long, complex prompts. Uses X-Omni/LongText-Bench dataset from HuggingFace. - Add setup_long_text_bench_dataset with num_samples filtering - Register in base_datasets with prompt_with_auxiliaries_collate - Add BenchmarkInfo with metrics: ["text_score"] - Preserve text_content field in auxiliaries for evaluation - Add tests Co-authored-by: Cursor <cursoragent@cursor.com>

Move summary to new line after opening quotes per Numpydoc GL01. Co-authored-by: Cursor <cursoragent@cursor.com>

Document all dataclass fields per Numpydoc PR01 with summary on new line per GL01. Co-authored-by: Cursor <cursoragent@cursor.com>

- Add list_benchmarks() to filter benchmarks by task type - Add get_benchmark_info() to retrieve benchmark metadata - Add COCO, ImageNet, WikiText to benchmark_info registry Co-authored-by: Cursor <cursoragent@cursor.com>

Update benchmark metrics to match registered names: - clip -> clip_score - clip_iqa -> clipiqa - Remove unimplemented top5_accuracy Co-authored-by: Cursor <cursoragent@cursor.com>

- Add list_benchmarks() to filter benchmarks by task type - Add get_benchmark_info() to retrieve benchmark metadata - Add COCO, ImageNet, WikiText to benchmark_info registry - Fix metric names to match MetricRegistry (clip_score, clipiqa) Co-authored-by: Cursor <cursoragent@cursor.com>

…-longtextbench-benchmark

github-actions · 2026-02-14T00:11:45Z

This PR has been inactive for 10 days and is now marked as stale.

davidberenstein1957 and others added 9 commits January 22, 2026 10:58

fix: correct Numpydoc format for BenchmarkInfo docstring

e3002b8

Move summary to new line after opening quotes per Numpydoc GL01. Co-authored-by: Cursor <cursoragent@cursor.com>

fix: add Numpydoc parameter docs for BenchmarkInfo

975adb3

Document all dataclass fields per Numpydoc PR01 with summary on new line per GL01. Co-authored-by: Cursor <cursoragent@cursor.com>

fix: use correct metric names from MetricRegistry

56f2167

Update benchmark metrics to match registered names: - clip -> clip_score - clip_iqa -> clipiqa - Remove unimplemented top5_accuracy Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'feat/add-partiprompts-benchmark-to-pruna' into feat/add…

4fde301

…-longtextbench-benchmark

davidberenstein1957 changed the base branch from feat/add-partiprompts-benchmark-to-pruna to main January 31, 2026 16:04

github-actions bot added the stale label Feb 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LongTextBench benchmark#506

feat: add LongTextBench benchmark#506
davidberenstein1957 wants to merge 9 commits intomainfrom
feat/add-longtextbench-benchmark

davidberenstein1957 commented Jan 31, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidberenstein1957 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davidberenstein1957 commented Jan 31, 2026 •

edited

Loading