Skip to content

feat: add LongTextBench benchmark#506

Open
davidberenstein1957 wants to merge 9 commits intomainfrom
feat/add-longtextbench-benchmark
Open

feat: add LongTextBench benchmark#506
davidberenstein1957 wants to merge 9 commits intomainfrom
feat/add-longtextbench-benchmark

Conversation

@davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 commented Jan 31, 2026

Closes #513

Summary

  • Add LongTextBench for evaluating text-to-image with long, complex prompts
  • Uses X-Omni/LongText-Bench dataset from HuggingFace
  • Preserves text_content field in auxiliaries for text_score evaluation

Changes

  • Add setup_long_text_bench_dataset in src/pruna/data/datasets/prompt.py
  • Register in base_datasets in src/pruna/data/__init__.py
  • Add BenchmarkInfo with metrics: ["text_score"]
  • Add tests for basic loading and auxiliaries

Test plan

  • test_long_text_bench_auxiliaries passes
  • test_dm_from_string[LongTextBench-...] passes

davidberenstein1957 and others added 9 commits January 22, 2026 10:58
…mpts benchmark

- Introduced `from_benchmark` method in `PrunaDataModule` to create instances from benchmark classes.
- Added `Benchmark`, `BenchmarkEntry`, and `BenchmarkRegistry` classes for managing benchmarks.
- Implemented `PartiPrompts` benchmark for text-to-image generation with various categories and challenges.
- Created utility function `benchmark_to_datasets` to convert benchmarks into datasets compatible with `PrunaDataModule`.
- Added integration tests for benchmark functionality and data module interactions.
…filtering

- Remove heavy benchmark abstraction (Benchmark class, registry, adapter, 24 subclasses)
- Extend setup_parti_prompts_dataset with category and num_samples params
- Add BenchmarkInfo dataclass for metadata (metrics, description, subsets)
- Switch PartiPrompts to prompt_with_auxiliaries_collate to preserve Category/Challenge
- Merge tests into test_datamodule.py

Reduces 964 lines to 128 lines (87% reduction)

Co-authored-by: Cursor <cursoragent@cursor.com>
Add LongTextBench for evaluating text-to-image with long, complex prompts.
Uses X-Omni/LongText-Bench dataset from HuggingFace.

- Add setup_long_text_bench_dataset with num_samples filtering
- Register in base_datasets with prompt_with_auxiliaries_collate
- Add BenchmarkInfo with metrics: ["text_score"]
- Preserve text_content field in auxiliaries for evaluation
- Add tests

Co-authored-by: Cursor <cursoragent@cursor.com>
Move summary to new line after opening quotes per Numpydoc GL01.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document all dataclass fields per Numpydoc PR01 with summary on new line per GL01.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add list_benchmarks() to filter benchmarks by task type
- Add get_benchmark_info() to retrieve benchmark metadata
- Add COCO, ImageNet, WikiText to benchmark_info registry

Co-authored-by: Cursor <cursoragent@cursor.com>
Update benchmark metrics to match registered names:
- clip -> clip_score
- clip_iqa -> clipiqa
- Remove unimplemented top5_accuracy

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add list_benchmarks() to filter benchmarks by task type
- Add get_benchmark_info() to retrieve benchmark metadata
- Add COCO, ImageNet, WikiText to benchmark_info registry
- Fix metric names to match MetricRegistry (clip_score, clipiqa)

Co-authored-by: Cursor <cursoragent@cursor.com>
@davidberenstein1957 davidberenstein1957 changed the base branch from feat/add-partiprompts-benchmark-to-pruna to main January 31, 2026 16:04
@github-actions
Copy link

This PR has been inactive for 10 days and is now marked as stale.

@github-actions github-actions bot added the stale label Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BENCHMARK] Add LongTextBench benchmark

1 participant