Merging models has come about as a popular new paradigm for improving pretrained model performance at lower paramaters and without the need to retrain. There are several ways to merge models and a significant amount of work right now going into testing different combinations of models to find out what can bring about large improvements.
MLLM is a framework for evaluating different merged, pre-trained LLMs. We aim to benchmark and evaluate a Merge of Large Language Models (MLLM) that leverages several medium sized LLMs (~100 million parameters each) to attempt to match the performance of larger, state-of-the-art LLMs. We focus on classification tasks and classification performance on a particular domain, in our case on the Drug Review Dataset (Drugs.com) via UCI ML Repository.
Authors:
- Austin Tao (austin.tao@berkeley.edu)
- Robert Thompson (robert_thompson@berkeley.edu)
- Phudish (Tam) Prateepamornkul (phudish_p@berkeley.edu)
- Sean McAvoy (sean_mcavoy@berkeley.edu)
We use MergeKit to run our LLM merging operations. Specifically, we evalaute three merge techniques:
- Linear Merge Wortsman et al.
- TIES Merge Yadav et al.
- DARE + TIES/Linear Merge Yu et al.
For the base LLM, we use GPT-2 Radford et al. for its relatively small size and moderate base performance.
So far we are evaluating FLAN pretrained models on Huggingface, and are considering merging with smaller iterations of LLama and some other more specific models We specifically will be assessing summarization performance on a particular domain, in our case on the Drug Review Dataset (Drugs.com) via UCI ML Repository. We also use two other data sets, summarized in this table
| Name | Average Word Count | Median Word Count | Word Count Std Dev |
|---|---|---|---|
| Yelp | 134.098089 | 99.0 | 121.396115 |
| Drug | 84.699802 | 84.0 | 45.044833 |
| Android | 11.775620 | 5.0 | 17.058663 |