Open
Conversation
…mpts benchmark - Introduced `from_benchmark` method in `PrunaDataModule` to create instances from benchmark classes. - Added `Benchmark`, `BenchmarkEntry`, and `BenchmarkRegistry` classes for managing benchmarks. - Implemented `PartiPrompts` benchmark for text-to-image generation with various categories and challenges. - Created utility function `benchmark_to_datasets` to convert benchmarks into datasets compatible with `PrunaDataModule`. - Added integration tests for benchmark functionality and data module interactions.
…filtering - Remove heavy benchmark abstraction (Benchmark class, registry, adapter, 24 subclasses) - Extend setup_parti_prompts_dataset with category and num_samples params - Add BenchmarkInfo dataclass for metadata (metrics, description, subsets) - Switch PartiPrompts to prompt_with_auxiliaries_collate to preserve Category/Challenge - Merge tests into test_datamodule.py Reduces 964 lines to 128 lines (87% reduction) Co-authored-by: Cursor <cursoragent@cursor.com>
Add LongTextBench for evaluating text-to-image with long, complex prompts. Uses X-Omni/LongText-Bench dataset from HuggingFace. - Add setup_long_text_bench_dataset with num_samples filtering - Register in base_datasets with prompt_with_auxiliaries_collate - Add BenchmarkInfo with metrics: ["text_score"] - Preserve text_content field in auxiliaries for evaluation - Add tests Co-authored-by: Cursor <cursoragent@cursor.com>
Move summary to new line after opening quotes per Numpydoc GL01. Co-authored-by: Cursor <cursoragent@cursor.com>
Document all dataclass fields per Numpydoc PR01 with summary on new line per GL01. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add list_benchmarks() to filter benchmarks by task type - Add get_benchmark_info() to retrieve benchmark metadata - Add COCO, ImageNet, WikiText to benchmark_info registry Co-authored-by: Cursor <cursoragent@cursor.com>
Update benchmark metrics to match registered names: - clip -> clip_score - clip_iqa -> clipiqa - Remove unimplemented top5_accuracy Co-authored-by: Cursor <cursoragent@cursor.com>
- Add list_benchmarks() to filter benchmarks by task type - Add get_benchmark_info() to retrieve benchmark metadata - Add COCO, ImageNet, WikiText to benchmark_info registry - Fix metric names to match MetricRegistry (clip_score, clipiqa) Co-authored-by: Cursor <cursoragent@cursor.com>
…-longtextbench-benchmark
|
This PR has been inactive for 10 days and is now marked as stale. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #513
Summary
X-Omni/LongText-Benchdataset from HuggingFacetext_contentfield in auxiliaries for text_score evaluationChanges
setup_long_text_bench_datasetinsrc/pruna/data/datasets/prompt.pybase_datasetsinsrc/pruna/data/__init__.pyBenchmarkInfowith metrics:["text_score"]Test plan
test_long_text_bench_auxiliariespassestest_dm_from_string[LongTextBench-...]passes