Add native DeepSeek-V3.2 support by XingyuHu109 · Pull Request #44481 · huggingface/transformers

XingyuHu109 · 2026-03-05T21:14:38Z

Summary

This PR adds native Transformers support for DeepSeek-V3.2.

It introduces a new deepseek_v32 model family so the official checkpoints resolve through the standard auto classes without trust_remote_code. The implementation keeps the DeepSeek-V3 MoE structure and plugs in the in-tree DSA attention/indexer path that V3.2 uses. Docs and a dedicated test module are included as well.

Closes #41196.

Validation

PYTHONPATH=src python -m pytest tests/models/deepseek_v32/test_modeling_deepseek_v32.py -q
- 123 passed, 129 skipped
PYTHONPATH=src python -m ruff check src/transformers/models/deepseek_v32 tests/models/deepseek_v32
- passed
official auto-class resolution with this branch:
- AutoConfig.from_pretrained("deepseek-ai/DeepSeek-V3.2") -> DeepseekV32Config
- AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V3.2", ...) -> DeepseekV32ForCausalLM
end-to-end native load and generation against the published deepseek-ai/DeepSeek-V3.2 checkpoint completed successfully
current GitHub checks are green

Note

The public tokenizer config does not currently ship a chat_template, so apply_chat_template(...) still needs an explicit template.

Copilot

Pull request overview

This PR adds native Hugging Face Transformers support for the new DeepSeek-V3.2 architecture (deepseek_v32) so official checkpoints (e.g. deepseek-ai/DeepSeek-V3.2) resolve through standard auto-classes without requiring trust_remote_code.

Changes:

Introduces the deepseek_v32 model family (config + PyTorch modeling), registers it in auto-class mappings, and adds docs/tests.
Improves robustness of dynamic attention/experts-implementation detection (avoids KeyError when sys.modules lacks the model module entry).
Extends accelerate integration to support preload_module_classes dispatch (relevant for disk-offloaded MoE blocks) and fixes FP8 quantizer device-map validation precedence.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/transformers/models/deepseek_v32/modular_deepseek_v32.py`	Modular source defining DeepSeek-V3.2 config/model with DSA indexer + MoE.
`src/transformers/models/deepseek_v32/configuration_deepseek_v32.py`	Generated config for `deepseek_v32`.
`src/transformers/models/deepseek_v32/modeling_deepseek_v32.py`	Generated PyTorch implementation + generation support.
`src/transformers/models/deepseek_v32/__init__.py`	Lazy import plumbing for new model package.
`src/transformers/models/auto/configuration_auto.py`	Registers `deepseek_v32` config in auto config mapping + display name.
`src/transformers/models/auto/modeling_auto.py`	Registers `deepseek_v32` model + causal LM in auto model mappings.
`src/transformers/models/__init__.py`	Exposes the new model package under `transformers.models`.
`tests/models/deepseek_v32/test_modeling_deepseek_v32.py`	New unit tests validating config fields, auto-class resolution, caching shapes, and disk-offloaded MoE behavior.
`docs/source/en/model_doc/deepseek_v32.md`	New model documentation page + usage example.
`docs/source/en/_toctree.yml`	Adds DeepSeek-V3.2 docs page to the docs navigation.
`src/transformers/integrations/accelerate.py`	Adds `preload_module_classes` dispatch support with a compatibility shim for hook recursion.
`src/transformers/modeling_utils.py`	Makes `_can_set_attn_implementation` / `_can_set_experts_implementation` robust to missing `sys.modules` entries.
`src/transformers/quantizers/quantizer_finegrained_fp8.py`	Fixes device-map validation precedence and updates `isinstance` check style.
`tests/quantization/finegrained_fp8/test_fp8.py`	Adds test ensuring pre-quantized FP8 models allow disk offload in device maps.
`tests/utils/test_modeling_utils.py`	Adds regression test for missing `sys.modules` module-cache entry.
`utils/check_config_attributes.py`	Allows V3.2 “metadata” fields not referenced by modeling code.
`src/transformers/conversion_mapping.py` / `docs/source/en/weightconverter.md`	Adds `deepseek_v32` to weight conversion pattern mapping.

Copilot · 2026-03-06T01:15:30Z

src/transformers/integrations/accelerate.py

+    Compatibility shim for `accelerate` releases that keep recursing into children after attaching a preloaded
+    parent block hook. When `preload_module_classes` is active, the parent hook already manages its submodules.
+    """
+    if not isinstance(execution_device, Mapping) and not isinstance(offload, dict):


In _attach_align_device_hook_on_blocks_for_preload, the early-return branch checks not isinstance(offload, dict), but later the function treats offload as a generic Mapping. If accelerate ever passes a non-dict mapping (e.g. OrderedDict) this condition will be true and the code will incorrectly treat the mapping like a scalar boolean, potentially attaching the wrong hooks. Use Mapping consistently here (i.e. not isinstance(offload, Mapping)) so the scalar-vs-mapping logic is correct.

Suggested change

if not isinstance(execution_device, Mapping) and not isinstance(offload, dict):

if not isinstance(execution_device, Mapping) and not isinstance(offload, Mapping):

Copilot · 2026-03-06T01:15:31Z

src/transformers/models/deepseek_v32/modeling_deepseek_v32.py

+        >>> model = DeepseekV32ForCausalLM.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")
+        >>> tokenizer = AutoTokenizer.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")


The docstring example references a model repo (meta-deepseek_v32/DeepseekV32-2-7b-hf) that does not match the official checkpoint this PR adds native support for (deepseek-ai/DeepSeek-V3.2). This will mislead users copying the snippet; please update the example to use the official model id (and matching tokenizer).

Suggested change

>>> model = DeepseekV32ForCausalLM.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")

>>> tokenizer = AutoTokenizer.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")

>>> model = DeepseekV32ForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V3.2")

>>> tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")

Copilot · 2026-03-06T01:15:31Z

tests/models/deepseek_v32/test_modeling_deepseek_v32.py

+    @unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")
+    def test_left_padding_compatibility(self):
+        pass
+
+    @unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")
+    def test_sdpa_padding_matches_padding_free_with_position_ids(self):
+        pass
+
+    @unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")


Several tests are skipped with reasons phrased as uncertainty (e.g. "Not sure MoE can pass this..."). Skip reasons should be specific and actionable (ideally referencing a known limitation or a tracking issue) so it’s clear whether this is an expected permanent limitation or a temporary gap to fix.

Suggested change

@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")

def test_left_padding_compatibility(self):

pass

@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")

def test_sdpa_padding_matches_padding_free_with_position_ids(self):

pass

@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")

@unittest.skip("Skipped: MoE routing with the DSA indexer produces non-deterministic outputs with respect to padding, so left-padding compatibility cannot be reliably tested.")

def test_left_padding_compatibility(self):

pass

@unittest.skip("Skipped: MoE routing with the DSA indexer produces non-deterministic outputs with respect to padding, so SDPA vs padding-free behavior with position_ids cannot be reliably compared.")

def test_sdpa_padding_matches_padding_free_with_position_ids(self):

pass

@unittest.skip("Skipped: MoE routing with the DSA indexer produces non-deterministic outputs with respect to padding, making this overfitting test unreliable.")

XingyuHu109 · 2026-03-06T01:35:37Z

All checks are green now, and the DeepSeek-V3.2 native load/generation path has been validated on this branch. Would appreciate a review when someone working in this area has time.

github-actions · 2026-03-07T19:27:46Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_v32, finegrained_fp8

Rocketknight1 · 2026-03-09T13:16:35Z

cc @ArthurZucker!

XingyuHu109 added 4 commits March 5, 2026 16:14

Add native DeepSeek-V3.2 support

629bb14

Fix offloaded DeepSeek MoE inference

a4d1306

Sync modular outputs and lint fixes

36a6e0f

Narrow preload scope and skip TP quantized test

03d1246

XingyuHu109 force-pushed the add-deepseek-v32-support branch from adc22b1 to 03d1246 Compare March 5, 2026 23:49

XingyuHu109 added 6 commits March 5, 2026 18:57

Fix DeepseekV32Config docstring

c1e1a31

Add DeepSeek-V3.2 config checkpoint link

6c0be50

Allow DeepSeek-V3.2 compatibility config attrs

28ad024

Add release date to DeepSeek-V3.2 docs

889f87d

Handle missing module cache entries in modeling utils

9278f2c

Retry CI after flaky TAPAS shard

5f718e9

XingyuHu109 marked this pull request as ready for review March 6, 2026 01:09

Copilot AI review requested due to automatic review settings March 6, 2026 01:09

XingyuHu109 changed the title ~~[WIP] Add native DeepSeek-V3.2 support~~ Add native DeepSeek-V3.2 support Mar 6, 2026

Copilot started reviewing on behalf of XingyuHu109 March 6, 2026 01:10 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Decouple lightweight FP8 quantizer tests

5210af0

XingyuHu109 added 2 commits March 5, 2026 20:42

Address DeepSeek-V3.2 review comments

70e013f

Sync DeepSeek-V3.2 modular generation

eba7104

Jintao-Huang mentioned this pull request Mar 6, 2026

[megatron] support deepseek-v3.2 modelscope/ms-swift#8226

Merged

Merge branch 'main' into add-deepseek-v32-support

59fd0f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native DeepSeek-V3.2 support#44481

Add native DeepSeek-V3.2 support#44481
XingyuHu109 wants to merge 14 commits intohuggingface:mainfrom
XingyuHu109:add-deepseek-v32-support

XingyuHu109 commented Mar 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

XingyuHu109 commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 7, 2026

Uh oh!

Rocketknight1 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if not isinstance(execution_device, Mapping) and not isinstance(offload, dict):
	if not isinstance(execution_device, Mapping) and not isinstance(offload, Mapping):

		>>> model = DeepseekV32ForCausalLM.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")
		>>> tokenizer = AutoTokenizer.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")

Conversation

XingyuHu109 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Note

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

XingyuHu109 commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 7, 2026

Uh oh!

Rocketknight1 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

XingyuHu109 commented Mar 5, 2026 •

edited

Loading