Skip to content

A user-friendly streamlit UI for running various lm_eval supported benchmarks on large language models

License

Notifications You must be signed in to change notification settings

TeichAI/Model-Benchmark-Suite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model-Benchmark-Suite

A user-friendly streamlit UI for running various lm_eval supported benchmarks on large language models and to compare them with one another.

Supported Benchmarks:

  • gpqa_diamond_zeroshot
  • gsm8k
  • winogrande
  • arc_challenge
  • hellaswag
  • truthfulqa_mc2
  • mmlu

Quick Start

Clone into the repo:

git clone https://github.com/TeichAI/Model-Benchmark-Suite.git
cd Model-Benchmark-Suite

Install deps and start the app:

pip install -r requirements.txt
streamlit run app.py

About

A user-friendly streamlit UI for running various lm_eval supported benchmarks on large language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages