huggingface · BenjaminBossan · Apr 7, 2026 · Jan 10, 2026 · Jan 10, 2026 · Jan 13, 2026
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -80,6 +80,8 @@
   - sections:
     - local: package_reference/adalora
       title: AdaLoRA
+    - local: package_reference/adamss
+      title: AdaMSS
     - local: package_reference/ia3
       title: IA3
     - local: package_reference/llama_adapter

diff --git a/docs/source/package_reference/adamss.md b/docs/source/package_reference/adamss.md
@@ -0,0 +1,38 @@
+<!--Copyright 2026 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# AdaMSS
+
+[AdaMSS](https://openreview.net/forum?id=8ZdWmpYxT0) (AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning) is a parameter-efficient fine-tuning method that decomposes weight matrices using SVD and clusters the decomposed space into multiple trainable subspaces. Each subspace learns independent low-rank updates while the original weights remain frozen. AdaMSS also supports Adaptive Subspace Allocation (ASA), which dynamically prunes less important subspaces during training based on gradient information.
+
+The abstract from the paper is:
+
+> We propose AdaMSS, an adaptive multi-subspace approach for parameter-efficient fine-tuning of large models. Unlike traditional parameterefficient fine-tuning methods that operate within a large single subspace of the network weights, AdaMSS leverages subspace segmentation to obtain multiple smaller subspaces and adaptively reduces the number of trainable parameters during training, ultimately updating only those associated with a small subset of subspaces most relevant to the target downstream task. By using the lowest-rank representation, AdaMSS achieves more compact expressiveness and finer tuning of the model parameters. Theoretical analyses demonstrate that AdaMSS has better generalization guarantee than LoRA, PiSSA, and other single-subspace low-rankbased methods. Extensive experiments across image classification, natural language understanding, and natural language generation tasks show that AdaMSS achieves comparable performance to full fine-tuning and outperforms other parameterefficient fine-tuning methods in most cases, all while requiring fewer trainable parameters. Notably, on the ViT-Large model, AdaMSS achieves 4.7% higher average accuracy than LoRA across seven tasks, using just 15.4% of the trainable parameters. On RoBERTa-Large, AdaMSS outperforms PiSSA by 7% in average accuracy across six tasks while reducing the number of trainable parameters by approximately 94.4%. These results demonstrate the effectiveness of AdaMSS in parameter-efficient fine-tuning. The code for AdaMSS is available at https: //github.com/jzheng20/AdaMSS.
+
+
+AdaMSS currently has the following constraints:
+- Only `nn.Linear` layers are supported.
+- Requires scikit-learn for the KMeans clustering step.
+
+If these constraints don't work for your use case, consider other methods instead.
+
+## AdamssConfig
+
+[[autodoc]] tuners.adamss.config.AdamssConfig
+
+## AdamssModel
+
+[[autodoc]] tuners.adamss.model.AdamssModel
diff --git a/examples/adamss_finetuning/README.md b/examples/adamss_finetuning/README.md
@@ -0,0 +1,203 @@
+# AdaMSS Fine-tuning
+
+## Introduction
+
+AdaMSS (Adaptive Matrix Decomposition with Subspace Selection) is a parameter-efficient fine-tuning method that decomposes weight matrices using SVD into low-rank subspaces. It uses only **~0.07%** of original trainable parameters (e.g., 59K for ViT-Base vs 86M full fine-tuning) while maintaining competitive performance. 
+
+The method optionally supports **ASA** (Adaptive Subspace Allocation) for dynamic subspace selection during training, further improving efficiency and performance.
+
+See the [paper](https://neurips.cc/virtual/2025/poster/119606) for more details.
+
+
+## Installation & Quick Test
+
+Install from local source:
+```bash
+cd peft-main && pip install -e .
+pip install transformers datasets torch torchvision evaluate accelerate scikit-learn
+```
+
+Verify installation:
+```bash
+python -c "from peft import AdamssConfig; print('AdaMSS ready')"
+```
+
+## Detailed Code Explanation
+
+**Core AdaMSS Configuration:**
+```python
+from peft import AdamssConfig, get_peft_model
+
+# Configure AdaMSS with ASA
+config = AdamssConfig(
+    r=100,                          # SVD rank (full decomposition rank)
+    num_subspaces=10,               # Number of subspaces (K) - initial capacity
+    subspace_rank=3,                # Rank per subspace (ri) - use 1 for NLU, 3 for Vision
+    target_modules=["query", "value"],  # Target attention layers
+    use_asa=True,                   # Enable Adaptive Subspace Allocation
+    asa_target_subspaces=5,         # Target active subspaces (ASA reduces K→5)
+    init_warmup=50,                 # Start ASA after 50 steps
+    final_warmup=1000,              # Complete masking by step 1000
+    mask_interval=100,              # Update mask every 100 steps
+    modules_to_save=["classifier"], # Modules to train without decomposition
+)
+peft_model = get_peft_model(model, config)
+```
+
+**Option A – With HuggingFace Trainer (callback):**
+```python
+from peft.tuners.adamss.asa_callback import AdamssAsaCallback
+
+# The callback is a thin wrapper around model.update_and_allocate()
+trainer = Trainer(
+    model=peft_model,
+    callbacks=[AdamssAsaCallback()],
+    # ... other arguments
+)
+trainer.train()
+```
+
+**Option B – Custom training loop (no Trainer needed):**
+```python
+for step, batch in enumerate(dataloader):
+    loss = peft_model(**batch).loss
+    loss.backward()
+    optimizer.step()
+    peft_model.base_model.update_and_allocate(step)   # ← all ASA logic in one call
+    optimizer.zero_grad()
+```
+
+**Key Points:**
+- **Parameterization**: Total params = `r × (d_in + d_out)`, split into K subspaces of rank `ri` each
+- **ASA Mechanism**: Dynamically selects `asa_target_subspaces` most important subspaces from initial `num_subspaces`
+- **Warmup Schedule**: ASA gradually increases masking strength from `init_warmup` to `final_warmup`
+- **Vision vs NLU**: Use `subspace_rank=3` for vision, `subspace_rank=1` for NLU tasks
+
+## Use the training example scripts
+
+### Vision Tasks (Image Classification)
+
+Run the provided script with your configuration:
+```bash
+python examples/adamss_finetuning/image_classification_adamss_asa.py \
+    --model_name_or_path google/vit-base-patch16-224-in21k \
+    --dataset_name cifar10 \
+    --adamss_r 100 \
+    --adamss_k 10 \
+    --adamss_ri 3 \
+    --use_asa \
+    --asa_target_subspaces 5 \
+    --output_dir ./output
+```
+
+### NLU Tasks (GLUE Benchmark)
+
+Run GLUE tasks (e.g., CoLA) with ASA:
+```bash
+python examples/adamss_finetuning/glue_adamss_asa_example.py \
+    --dataset_name cola \
+    --adamss_r 100 \
+    --adamss_k 10 \
+    --adamss_ri 1 \
+    --use_asa \
+    --asa_target_subspaces 5 \
+    --num_epochs 100 \
+    --batch_size 32 \
+    --output_dir ./output_cola_asa
+```
+
+Without ASA (fixed K=10):
+```bash
+python examples/adamss_finetuning/glue_adamss_asa_example.py \
+    --dataset_name cola \
+    --adamss_r 100 \
+    --adamss_k 10 \
+    --adamss_ri 1 \
+    --num_epochs 100 \
+    --batch_size 32 \
+    --output_dir ./output_cola_no_asa
+```
+
+### AdamssConfig Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `r` | int | 100 | SVD decomposition rank |
+| `num_subspaces` | int | 10 | Number of subspaces (K) |
+| `subspace_rank` | int | 3 | Rank per subspace (ri) |
+| `target_modules` | list | - | Modules to apply AdaMSS (e.g., ["query", "value"]) |
+| `use_asa` | bool | False | Enable Adaptive Subspace Allocation |
+| `asa_target_subspaces` | int | None | Target active subspaces when ASA enabled |
+| `modules_to_save` | list | None | Modules to train without decomposition |
+
+### AdamssAsaCallback
+
+The ASA callback reads all parameters from `AdamssConfig`. Import it directly:
+
+```python
+from peft.tuners.adamss.asa_callback import AdamssAsaCallback
+```
+
+ASA-related config parameters:
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `init_warmup` | int | 50 | Steps before starting masking |
+| `final_warmup` | int | 1000 | Steps to reach target active subspaces |
+| `mask_interval` | int | 100 | Steps between subspace selection updates |
+| `asa_importance_beta` | float | 0.85 | EMA decay for importance tracking |
+| `asa_uncertainty_beta` | float | 0.85 | EMA decay for uncertainty tracking |
+| `asa_schedule_exponent` | float | 3.0 | Exponent for masking schedule |
+
+
+## Experimental Results
+
+### NLU Tasks (GLUE Benchmark)
+
+Results with AdaMSS + ASA (100 epochs, seed=0):
+
+| Task | Model | AdaMSS Params | Metric | Score |
+|------|-------|---------------|--------|-------|
+| CoLA | RoBERTa-base | 27.0K (ASA K→5) | Matthews | **0.6466** |
+| CoLA | RoBERTa-large | 64.8K (ASA K→5) | Matthews | **0.7093** |
+| MRPC | RoBERTa-base | 27.2K (ASA K→5) | Accuracy | **0.8824** |
+| MRPC | RoBERTa-large | 66.7K (ASA K→5) | Accuracy | **0.9044** |
+
+**Notes:**
+- Configuration: r=100, K=10→5 (ASA), ri=1
+- AdaMSS active params with ASA (5 out of 10 subspaces selected)
+- Full AdaMSS capacity: 97K (large) / 42K (base)
+- Training: 100 epochs, batch_size=32, warmup_ratio=0.06
+
+### Vision Tasks (Image Classification)
+
+Results with AdaMSS on Stanford Cars (10 epochs, seed=0):
+
+| Model | Method | AdaMSS Params | Test Accuracy |
+|-------|--------|---------------|---------------|
+| ViT-Base | AdaMSS (no ASA) | 121K (K=10) | **82.15%** |
+| ViT-Base | AdaMSS + ASA | 75.0K (K→5) | **80.45%** |
+
+**Notes:**
+- Configuration: r=100, K=10, ri=3, 10 epochs, batch_size=32
+- ASA dynamically selects 5 out of 10 subspaces (75K active from 121K total)
+
+
+
+## Citation
+
+If you use AdaMSS in your research, please cite:
+
+```bibtex
+@inproceedings{zheng2025adamss,
+  title={AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning},
+  author={Zheng, Jingjing and Lu, Wanglong and Dong, Yiming and Ji, Chaojie and Cao, Yankai and Lin, Zhouchen},
+  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
+  year={2025},
+}
+```
+
+## Reference
+
+- [AdaMSS Paper](https://neurips.cc/virtual/2025/loc/san-diego/poster/119606)
+- [PEFT Documentation](https://huggingface.co/docs/peft)