Merged Large Language Models (MLLM): Blending Ensemble Learning and Deep Learning

Merging models has come about as a popular new paradigm for improving pretrained model performance at lower paramaters and without the need to retrain. There are several ways to merge models and a significant amount of work right now going into testing different combinations of models to find out what can bring about large improvements.

MLLM is a framework for evaluating different merged, pre-trained LLMs. We aim to benchmark and evaluate a Merge of Large Language Models (MLLM) that leverages several medium sized LLMs (~100 million parameters each) to attempt to match the performance of larger, state-of-the-art LLMs. We focus on classification tasks and classification performance on a particular domain, in our case on the Drug Review Dataset (Drugs.com) via UCI ML Repository.

Authors:

Austin Tao (austin.tao@berkeley.edu)
Robert Thompson (robert_thompson@berkeley.edu)
Phudish (Tam) Prateepamornkul (phudish_p@berkeley.edu)
Sean McAvoy (sean_mcavoy@berkeley.edu)

Technologies

We use MergeKit to run our LLM merging operations. Specifically, we evalaute three merge techniques:

Linear Merge Wortsman et al.
TIES Merge Yadav et al.
DARE + TIES/Linear Merge Yu et al.

For the base LLM, we use GPT-2 Radford et al. for its relatively small size and moderate base performance.

Datasets

So far we are evaluating FLAN pretrained models on Huggingface, and are considering merging with smaller iterations of LLama and some other more specific models We specifically will be assessing summarization performance on a particular domain, in our case on the Drug Review Dataset (Drugs.com) via UCI ML Repository. We also use two other data sets, summarized in this table

Table: Descriptive Statistics Datasets

Name	Average Word Count	Median Word Count	Word Count Std Dev
Yelp	134.098089	99.0	121.396115
Drug	84.699802	84.0	45.044833
Android	11.775620	5.0	17.058663

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.jukit		.jukit
bin		bin
experiment_configs		experiment_configs
experiments		experiments
mllm		mllm
new_test_loss		new_test_loss
notebooks		notebooks
outputs		outputs
reading		reading
tests		tests
CITATION.cff		CITATION.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
constants.py		constants.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Merged Large Language Models (MLLM): Blending Ensemble Learning and Deep Learning

Technologies

Datasets

Table: Descriptive Statistics Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

alt2177/mllm-public

Folders and files

Latest commit

History

Repository files navigation

Merged Large Language Models (MLLM): Blending Ensemble Learning and Deep Learning

Technologies

Datasets

Table: Descriptive Statistics Datasets

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages