GitHub - ZIB-IOL/Free-Lunch-in-LLM-Compression

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

This repository contains the code to reproduce the experiments from the paper "A Free Lunch in LLM Compression: Revisiting Retraining after Pruning". The code is based on PyTorch 2.8 and the experiment-tracking platform Weights & Biases. The results in the paper were generated using Python 3.12 and CUDA 12.8 and the environment defined in requirements.txt.

Usage

The main.py file starts and configures experiments. The experiments are either configured within this file or via Weights & Biases. When the --debug flag is set, the config dictionary inside the file is used, otherwise it is overwritten with a config dictionary provided by Weights & Biases.

An example Weights & Biases config for a sweep pruning LLaMA-3-8B to 2:4 sparsity with Wanda and reconstructing with block size 1/2 (-1 in this code base) can be found in example_config.txt. The model with the best WikiText-2 perplexity is saved in checkpointdir/<sweep_id>. An example config for evaluating the best model found by the sweep is given in example_eval_config.txt.

The rest of the project is structured as follows:

caching_dummy.py: Contains a class that downloads and tokenizes the text data sets used for the experiments.
runner.py: Contains a class that prepares the model and data, starts the pruning and reconstruction, and manages fine-tuning.
custom_layers.py: Contains custom modules used for MaskLoRA PEFT after pruning.
peft_methods.py: Contains classes used for applying custom PEFT modules to the pruned model.
prune_methods.py: Contains a pipeline for pruning and reconstructing the model.
prune_flap.py: Contains a pipeline for pruning with FLAP.
utilities.py: Contains useful auxiliary functions and classes.
check_act_dist_main.py: Configures starts and configures the following script.
check_act_dist.py: Contains a class that measures the distances between the activations of given model checkpoints.

Citation

In case you find the paper or the implementation useful for your own research, please consider citing:

@misc{wagner2026freelunchllmcompression,
      title={A Free Lunch in LLM Compression: Revisiting Retraining after Pruning}, 
      author={Moritz Wagner and Christophe Roux and Max Zimmer and Sebastian Pokutta},
      year={2026},
      eprint={2510.14444},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.14444}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
caching_dummy.py		caching_dummy.py
check_act_dist.py		check_act_dist.py
check_act_dist_main.py		check_act_dist_main.py
citation.bib		citation.bib
customLayers.py		customLayers.py
example_config.txt		example_config.txt
example_dist_config.txt		example_dist_config.txt
example_eval_config.txt		example_eval_config.txt
main.py		main.py
peft_methods.py		peft_methods.py
prune_flap.py		prune_flap.py
prune_methods.py		prune_methods.py
requirements.txt		requirements.txt
run_sweep.sh		run_sweep.sh
runner.py		runner.py
utilities.py		utilities.py

Folders and files

Latest commit

History

Repository files navigation

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages