Skip to content

ZIB-IOL/Free-Lunch-in-LLM-Compression

Repository files navigation

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

This repository contains the code to reproduce the experiments from the paper "A Free Lunch in LLM Compression: Revisiting Retraining after Pruning". The code is based on PyTorch 2.8 and the experiment-tracking platform Weights & Biases. The results in the paper were generated using Python 3.12 and CUDA 12.8 and the environment defined in requirements.txt.

Usage

The main.py file starts and configures experiments. The experiments are either configured within this file or via Weights & Biases. When the --debug flag is set, the config dictionary inside the file is used, otherwise it is overwritten with a config dictionary provided by Weights & Biases.

An example Weights & Biases config for a sweep pruning LLaMA-3-8B to 2:4 sparsity with Wanda and reconstructing with block size 1/2 (-1 in this code base) can be found in example_config.txt. The model with the best WikiText-2 perplexity is saved in checkpointdir/<sweep_id>. An example config for evaluating the best model found by the sweep is given in example_eval_config.txt.

The rest of the project is structured as follows:

  • caching_dummy.py: Contains a class that downloads and tokenizes the text data sets used for the experiments.
  • runner.py: Contains a class that prepares the model and data, starts the pruning and reconstruction, and manages fine-tuning.
  • custom_layers.py: Contains custom modules used for MaskLoRA PEFT after pruning.
  • peft_methods.py: Contains classes used for applying custom PEFT modules to the pruned model.
  • prune_methods.py: Contains a pipeline for pruning and reconstructing the model.
  • prune_flap.py: Contains a pipeline for pruning with FLAP.
  • utilities.py: Contains useful auxiliary functions and classes.
  • check_act_dist_main.py: Configures starts and configures the following script.
  • check_act_dist.py: Contains a class that measures the distances between the activations of given model checkpoints.

Citation

In case you find the paper or the implementation useful for your own research, please consider citing:

@misc{wagner2026freelunchllmcompression,
      title={A Free Lunch in LLM Compression: Revisiting Retraining after Pruning}, 
      author={Moritz Wagner and Christophe Roux and Max Zimmer and Sebastian Pokutta},
      year={2026},
      eprint={2510.14444},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.14444}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages