Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
bf8a2ea
upload the adamss code
LonglongaaaGo Jan 10, 2026
9870fd9
add the peft type
LonglongaaaGo Jan 10, 2026
d5dccb6
imporve the code
LonglongaaaGo Jan 13, 2026
fab339f
fix the errors from case testing of test_custom_models.py
LonglongaaaGo Jan 14, 2026
c502453
pass the test and fix the issues discussed before
LonglongaaaGo Jan 19, 2026
635acf7
remove the print code
LonglongaaaGo Feb 6, 2026
4018423
fix the issues and fix the make style issues
LonglongaaaGo Feb 21, 2026
f935e9a
fix the isses
LonglongaaaGo Feb 28, 2026
2aeb009
Merge upstream main and resolve conflict in custom models test
LonglongaaaGo Mar 7, 2026
4b04d1d
feat: update AdamSS core logic and tests
LonglongaaaGo Mar 7, 2026
7926ec5
Merge branch 'huggingface:main' into adamss
LonglongaaaGo Mar 15, 2026
5ac3642
update for MetaMathQA testing
LonglongaaaGo Mar 15, 2026
42e2fb3
update using the make style
LonglongaaaGo Mar 17, 2026
c0ded2d
merge previous changes
LonglongaaaGo Mar 17, 2026
69ed20a
fix the issues for run and utils.py
LonglongaaaGo Mar 17, 2026
3d079d6
fix the issues for run and utils.py
LonglongaaaGo Mar 17, 2026
2b653bb
fix the issues for run and utils.py
LonglongaaaGo Mar 17, 2026
397eac6
Merge branch 'huggingface:main' into adamss
LonglongaaaGo Mar 20, 2026
25e2690
update for the mathQA
LonglongaaaGo Mar 20, 2026
d71496d
update for testing_common.py
LonglongaaaGo Mar 22, 2026
0161be1
update for bug fixing
LonglongaaaGo Mar 28, 2026
f707f50
Merge branch 'huggingface:main' into adamss
LonglongaaaGo Mar 28, 2026
8c73b77
Merge remote-tracking branch 'origin/adamss' into adamss
LonglongaaaGo Mar 28, 2026
439c2c2
update
LonglongaaaGo Apr 1, 2026
c31e13d
Merge branch 'huggingface:main' into adamss
LonglongaaaGo Apr 1, 2026
d921e26
Merge remote-tracking branch 'origin/adamss' into adamss
LonglongaaaGo Apr 1, 2026
db34eb4
Merge branch 'huggingface:main' into adamss
LonglongaaaGo Apr 3, 2026
1c1d64f
update
LonglongaaaGo Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@
- sections:
- local: package_reference/adalora
title: AdaLoRA
- local: package_reference/adamss
title: AdaMSS
- local: package_reference/ia3
title: IA3
- local: package_reference/llama_adapter
Expand Down
38 changes: 38 additions & 0 deletions docs/source/package_reference/adamss.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<!--Copyright 2026 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# AdaMSS

[AdaMSS](https://openreview.net/forum?id=8ZdWmpYxT0) (AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning) is a parameter-efficient fine-tuning method that decomposes weight matrices using SVD and clusters the decomposed space into multiple trainable subspaces. Each subspace learns independent low-rank updates while the original weights remain frozen. AdaMSS also supports Adaptive Subspace Allocation (ASA), which dynamically prunes less important subspaces during training based on gradient information.

The abstract from the paper is:

> We propose AdaMSS, an adaptive multi-subspace approach for parameter-efficient fine-tuning of large models. Unlike traditional parameterefficient fine-tuning methods that operate within a large single subspace of the network weights, AdaMSS leverages subspace segmentation to obtain multiple smaller subspaces and adaptively reduces the number of trainable parameters during training, ultimately updating only those associated with a small subset of subspaces most relevant to the target downstream task. By using the lowest-rank representation, AdaMSS achieves more compact expressiveness and finer tuning of the model parameters. Theoretical analyses demonstrate that AdaMSS has better generalization guarantee than LoRA, PiSSA, and other single-subspace low-rankbased methods. Extensive experiments across image classification, natural language understanding, and natural language generation tasks show that AdaMSS achieves comparable performance to full fine-tuning and outperforms other parameterefficient fine-tuning methods in most cases, all while requiring fewer trainable parameters. Notably, on the ViT-Large model, AdaMSS achieves 4.7% higher average accuracy than LoRA across seven tasks, using just 15.4% of the trainable parameters. On RoBERTa-Large, AdaMSS outperforms PiSSA by 7% in average accuracy across six tasks while reducing the number of trainable parameters by approximately 94.4%. These results demonstrate the effectiveness of AdaMSS in parameter-efficient fine-tuning. The code for AdaMSS is available at https: //github.com/jzheng20/AdaMSS.


AdaMSS currently has the following constraints:
- Only `nn.Linear` layers are supported.
- Requires scikit-learn for the KMeans clustering step.

If these constraints don't work for your use case, consider other methods instead.

## AdamssConfig

[[autodoc]] tuners.adamss.config.AdamssConfig

## AdamssModel

[[autodoc]] tuners.adamss.model.AdamssModel
203 changes: 203 additions & 0 deletions examples/adamss_finetuning/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# AdaMSS Fine-tuning

## Introduction

AdaMSS (Adaptive Matrix Decomposition with Subspace Selection) is a parameter-efficient fine-tuning method that decomposes weight matrices using SVD into low-rank subspaces. It uses only **~0.07%** of original trainable parameters (e.g., 59K for ViT-Base vs 86M full fine-tuning) while maintaining competitive performance.

The method optionally supports **ASA** (Adaptive Subspace Allocation) for dynamic subspace selection during training, further improving efficiency and performance.

See the [paper](https://neurips.cc/virtual/2025/poster/119606) for more details.


## Installation & Quick Test

Install from local source:
```bash
cd peft-main && pip install -e .
pip install transformers datasets torch torchvision evaluate accelerate scikit-learn
```

Verify installation:
```bash
python -c "from peft import AdamssConfig; print('AdaMSS ready')"
```

## Detailed Code Explanation

**Core AdaMSS Configuration:**
```python
from peft import AdamssConfig, get_peft_model

# Configure AdaMSS with ASA
config = AdamssConfig(
r=100, # SVD rank (full decomposition rank)
num_subspaces=10, # Number of subspaces (K) - initial capacity
subspace_rank=3, # Rank per subspace (ri) - use 1 for NLU, 3 for Vision
target_modules=["query", "value"], # Target attention layers
use_asa=True, # Enable Adaptive Subspace Allocation
asa_target_subspaces=5, # Target active subspaces (ASA reduces K→5)
init_warmup=50, # Start ASA after 50 steps
final_warmup=1000, # Complete masking by step 1000
mask_interval=100, # Update mask every 100 steps
modules_to_save=["classifier"], # Modules to train without decomposition
)
peft_model = get_peft_model(model, config)
```

**Option A – With HuggingFace Trainer (callback):**
```python
from peft.tuners.adamss.asa_callback import AdamssAsaCallback

# The callback is a thin wrapper around model.update_and_allocate()
trainer = Trainer(
model=peft_model,
callbacks=[AdamssAsaCallback()],
# ... other arguments
)
trainer.train()
```

**Option B – Custom training loop (no Trainer needed):**
```python
for step, batch in enumerate(dataloader):
loss = peft_model(**batch).loss
loss.backward()
optimizer.step()
peft_model.base_model.update_and_allocate(step) # ← all ASA logic in one call
optimizer.zero_grad()
```

**Key Points:**
- **Parameterization**: Total params = `r × (d_in + d_out)`, split into K subspaces of rank `ri` each
- **ASA Mechanism**: Dynamically selects `asa_target_subspaces` most important subspaces from initial `num_subspaces`
- **Warmup Schedule**: ASA gradually increases masking strength from `init_warmup` to `final_warmup`
- **Vision vs NLU**: Use `subspace_rank=3` for vision, `subspace_rank=1` for NLU tasks

## Use the training example scripts

### Vision Tasks (Image Classification)

Run the provided script with your configuration:
```bash
python examples/adamss_finetuning/image_classification_adamss_asa.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name cifar10 \
--adamss_r 100 \
--adamss_k 10 \
--adamss_ri 3 \
--use_asa \
--asa_target_subspaces 5 \
--output_dir ./output
```

### NLU Tasks (GLUE Benchmark)

Run GLUE tasks (e.g., CoLA) with ASA:
```bash
python examples/adamss_finetuning/glue_adamss_asa_example.py \
--dataset_name cola \
--adamss_r 100 \
--adamss_k 10 \
--adamss_ri 1 \
--use_asa \
--asa_target_subspaces 5 \
--num_epochs 100 \
--batch_size 32 \
--output_dir ./output_cola_asa
```

Without ASA (fixed K=10):
```bash
python examples/adamss_finetuning/glue_adamss_asa_example.py \
--dataset_name cola \
--adamss_r 100 \
--adamss_k 10 \
--adamss_ri 1 \
--num_epochs 100 \
--batch_size 32 \
--output_dir ./output_cola_no_asa
```

### AdamssConfig Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `r` | int | 100 | SVD decomposition rank |
| `num_subspaces` | int | 10 | Number of subspaces (K) |
| `subspace_rank` | int | 3 | Rank per subspace (ri) |
| `target_modules` | list | - | Modules to apply AdaMSS (e.g., ["query", "value"]) |
| `use_asa` | bool | False | Enable Adaptive Subspace Allocation |
| `asa_target_subspaces` | int | None | Target active subspaces when ASA enabled |
| `modules_to_save` | list | None | Modules to train without decomposition |

### AdamssAsaCallback

The ASA callback reads all parameters from `AdamssConfig`. Import it directly:

```python
from peft.tuners.adamss.asa_callback import AdamssAsaCallback
```

ASA-related config parameters:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `init_warmup` | int | 50 | Steps before starting masking |
| `final_warmup` | int | 1000 | Steps to reach target active subspaces |
| `mask_interval` | int | 100 | Steps between subspace selection updates |
| `asa_importance_beta` | float | 0.85 | EMA decay for importance tracking |
| `asa_uncertainty_beta` | float | 0.85 | EMA decay for uncertainty tracking |
| `asa_schedule_exponent` | float | 3.0 | Exponent for masking schedule |


## Experimental Results

### NLU Tasks (GLUE Benchmark)

Results with AdaMSS + ASA (100 epochs, seed=0):

| Task | Model | AdaMSS Params | Metric | Score |
|------|-------|---------------|--------|-------|
| CoLA | RoBERTa-base | 27.0K (ASA K→5) | Matthews | **0.6466** |
| CoLA | RoBERTa-large | 64.8K (ASA K→5) | Matthews | **0.7093** |
| MRPC | RoBERTa-base | 27.2K (ASA K→5) | Accuracy | **0.8824** |
| MRPC | RoBERTa-large | 66.7K (ASA K→5) | Accuracy | **0.9044** |

**Notes:**
- Configuration: r=100, K=10→5 (ASA), ri=1
- AdaMSS active params with ASA (5 out of 10 subspaces selected)
- Full AdaMSS capacity: 97K (large) / 42K (base)
- Training: 100 epochs, batch_size=32, warmup_ratio=0.06

### Vision Tasks (Image Classification)

Results with AdaMSS on Stanford Cars (10 epochs, seed=0):

| Model | Method | AdaMSS Params | Test Accuracy |
|-------|--------|---------------|---------------|
| ViT-Base | AdaMSS (no ASA) | 121K (K=10) | **82.15%** |
| ViT-Base | AdaMSS + ASA | 75.0K (K→5) | **80.45%** |

**Notes:**
- Configuration: r=100, K=10, ri=3, 10 epochs, batch_size=32
- ASA dynamically selects 5 out of 10 subspaces (75K active from 121K total)



## Citation

If you use AdaMSS in your research, please cite:

```bibtex
@inproceedings{zheng2025adamss,
title={AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning},
author={Zheng, Jingjing and Lu, Wanglong and Dong, Yiming and Ji, Chaojie and Cao, Yankai and Lin, Zhouchen},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
}
```

## Reference

- [AdaMSS Paper](https://neurips.cc/virtual/2025/loc/san-diego/poster/119606)
- [PEFT Documentation](https://huggingface.co/docs/peft)
Loading
Loading