Skip to content

GIMBench is a benchmarking framework for evaluating Guided Infilling Models (GIM).

License

Notifications You must be signed in to change notification settings

SculptAI/GIMBench

Repository files navigation

GIMBench

GIMBench is a benchmarking framework for evaluating Guided Infilling Models (GIM).

Overview

This project provides tools and benchmarks to evaluate models' ability to perform guided infilling tasks - generating text that follows specific constraints and patterns.

Installation

Install GIMBench using pip:

pip install gimbench

For development:

make install-dev

Usage

GIMBench provides several benchmark types:

  • CV Parsing: Evaluate models on structured information extraction from CVs
  • Regex Matching: Test models' ability to generate text matching specific patterns
  • Multiple Choice QA: Assess guided generation in question-answering contexts
  • Perplexity: Measure language modeling quality with constraints

Example Commands

Run MMLU-Pro benchmark:

python -m gimbench.mcqa.mmlu_pro \
    --model_type vllm \
    --model_name meta-llama/Llama-3.1-8B-Instruct \
    --base_url http://localhost:8000/v1

Run GPQA Diamond benchmark:

python -m gimbench.mcqa.gpqa_diamond \
    --model_type openai \
    --model_name gpt-4 \
    --api_key YOUR_API_KEY

Run GIM-SFT perplexity evaluation:

python -m gimbench.ppl.gim_sft \
    --model_type vllm-offline \
    --model_name meta-llama/Llama-3.1-8B-Instruct

Development

Run linting:

make lint

Fix linting issues automatically:

make lint-fix

Run pre-commit hooks:

make pre-commit

About

GIMBench is a benchmarking framework for evaluating Guided Infilling Models (GIM).

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

Languages