-
Notifications
You must be signed in to change notification settings - Fork 0
Add README documentation #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
bba2bdb
c29d42b
b72eeb9
41c7ee0
336ee5a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,78 @@ | ||||||||
| # GIMBench | ||||||||
|
|
||||||||
| GIMBench is a benchmarking framework for evaluating Guided Infilling Models (GIM). | ||||||||
|
|
||||||||
| ## Overview | ||||||||
|
|
||||||||
| This project provides tools and benchmarks to evaluate models' ability to perform guided infilling tasks - generating text that follows specific constraints and patterns. | ||||||||
|
|
||||||||
| ## Installation | ||||||||
|
|
||||||||
| Install GIMBench using pip: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| pip install gimbench | ||||||||
| ``` | ||||||||
|
|
||||||||
| For development: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| make install-dev | ||||||||
| ``` | ||||||||
|
|
||||||||
| ## Usage | ||||||||
|
|
||||||||
| GIMBench provides several benchmark types: | ||||||||
|
|
||||||||
| - **CV Parsing**: Evaluate models on structured information extraction from CVs | ||||||||
| - **Regex Matching**: Test models' ability to generate text matching specific patterns | ||||||||
| - **Multiple Choice QA**: Assess guided generation in question-answering contexts | ||||||||
| - **Perplexity**: Measure language modeling quality with constraints | ||||||||
|
|
||||||||
| ### Example Commands | ||||||||
|
|
||||||||
| Run MMLU-Pro benchmark: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| python -m gimbench.mcqa.mmlu_pro \ | ||||||||
| --model_type vllm \ | ||||||||
| --model_name meta-llama/Llama-3.1-8B-Instruct \ | ||||||||
| --base_url http://localhost:8000/v1 | ||||||||
| ``` | ||||||||
|
|
||||||||
| Run GPQA Diamond benchmark: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| python -m gimbench.mcqa.gpqa_diamond \ | ||||||||
| --model_type openai \ | ||||||||
| --model_name gpt-4 \ | ||||||||
|
||||||||
| --model_name gpt-4 \ | |
| --model_name gpt-4 \ | |
| --base_url https://api.openai.com/v1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The development install instructions rely on
make install-dev, which runsuv sync(see Makefile). Consider documentinguvas a prerequisite (or provide an alternative dev install command) so the steps are reproducible for new contributors.