Skip to content

fix: eliminate cross-terms bug when combining adapters with different weights#3013

Draft
wingding12 wants to merge 1 commit intohuggingface:mainfrom
wingding12:fix/negative-weight-adapter-combination
Draft

fix: eliminate cross-terms bug when combining adapters with different weights#3013
wingding12 wants to merge 1 commit intohuggingface:mainfrom
wingding12:fix/negative-weight-adapter-combination

Conversation

@wingding12
Copy link

Summary

Fixes #3004

When using add_weighted_adapter with combination_type in ['linear', 'ties', 'dare_linear', 'dare_ties', 'magnitude_prune'], the previous implementation combined A matrices and B matrices separately before multiplying. This introduced cross-terms that corrupted the result when combining multiple adapters with different weights.

The Bug

For example, with two adapters (A1, B1) and (A2, B2) and weights [1, -1]:

  • Expected: A1@B1 - A2@B2
  • Old (buggy): (A1-A2)@(B1+B2) = A1@B1 + A1@B2 - A2@B1 - A2@B2

The cross-terms A1@B2 and A2@B1 were incorrectly added, causing significant errors (in my tests, ~1.55 error norm vs ~1.55 expected norm - essentially completely wrong).

The Fix

This PR computes the full delta weight (A@B) for each adapter first, then combines them, and finally decomposes the result back to A and B using SVD. This ensures mathematically correct results.

Note on Approximation

The decomposition uses truncated SVD with the same rank as the input adapters, which introduces some approximation error. This is the same behavior as the existing SVD combination types (svd, ties_svd, etc.) and is expected.

In my tests:

  • Old buggy error: ~1.55 (completely wrong)
  • New fixed error (rank=4): ~0.32 (expected rank truncation, same as combination_type='svd')
  • New fixed error (rank=8): ~0.00001 (near exact with sufficient rank)

Users who need higher accuracy can use combination_type='svd' with a higher svd_rank parameter.

Test Plan

I tested manually with:

  1. Single adapter with negative weight (-1.0) - works correctly
  2. Two identical adapters with [1.0, -1.0] - cancels to zero correctly
  3. Two different adapters with [1.0, -1.0] - now matches expected result (within SVD truncation error)
  4. Three adapters with mixed weights [2.0, -1.0, 0.5] - works correctly
  5. All combination types (linear, ties, dare_linear, dare_ties, magnitude_prune) - all work

Also includes Conv2d layer handling (following the same pattern as the existing SVD implementation).

… weights

When using `add_weighted_adapter` with combination_type in ['linear', 'ties',
'dare_linear', 'dare_ties', 'magnitude_prune'], the previous implementation
combined A matrices and B matrices separately before multiplying. This
introduced cross-terms that corrupted the result when combining multiple
adapters with different weights.

For example, with two adapters (A1, B1) and (A2, B2) and weights [1, -1]:
- Expected: A1@B1 - A2@B2
- Old (buggy): (A1-A2)@(B1+B2) = A1@B1 + A1@B2 - A2@B1 - A2@B2

The cross-terms A1@B2 and A2@B1 were incorrectly added.

This fix computes the full delta weight (A@B) for each adapter first, then
combines them, and finally decomposes the result back to A and B using SVD.
This ensures mathematically correct results.

Note: The decomposition uses truncated SVD with the same rank as the input
adapters, which introduces some approximation error. This is the same behavior
as the existing SVD combination types and is expected. Users who need higher
accuracy can use combination_type='svd' with a higher svd_rank.

Fixes huggingface#3004
@BenjaminBossan
Copy link
Member

Thanks a lot for this PR to fix the incorrect merging behavior. It would be crucial to add tests for this. Do you still have your testing code available? Let's add it in the form of unit tests. If you need assistance with that, please let me know.

@BenjaminBossan
Copy link
Member

ping @wingding12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Combining adapters linearly with negative weights is broken

2 participants