Skip to content

alt2177/mllm-public

Repository files navigation

Merged Large Language Models (MLLM): Blending Ensemble Learning and Deep Learning

Merging models has come about as a popular new paradigm for improving pretrained model performance at lower paramaters and without the need to retrain. There are several ways to merge models and a significant amount of work right now going into testing different combinations of models to find out what can bring about large improvements.

MLLM is a framework for evaluating different merged, pre-trained LLMs. We aim to benchmark and evaluate a Merge of Large Language Models (MLLM) that leverages several medium sized LLMs (~100 million parameters each) to attempt to match the performance of larger, state-of-the-art LLMs. We focus on classification tasks and classification performance on a particular domain, in our case on the Drug Review Dataset (Drugs.com) via UCI ML Repository.

Authors:

Technologies

We use MergeKit to run our LLM merging operations. Specifically, we evalaute three merge techniques:

For the base LLM, we use GPT-2 Radford et al. for its relatively small size and moderate base performance.

Datasets

So far we are evaluating FLAN pretrained models on Huggingface, and are considering merging with smaller iterations of LLama and some other more specific models We specifically will be assessing summarization performance on a particular domain, in our case on the Drug Review Dataset (Drugs.com) via UCI ML Repository. We also use two other data sets, summarized in this table

Table: Descriptive Statistics Datasets

Name Average Word Count Median Word Count Word Count Std Dev
Yelp 134.098089 99.0 121.396115
Drug 84.699802 84.0 45.044833
Android 11.775620 5.0 17.058663

About

A framework for merging multiple LMs to improve OOD performance without additional training

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published