Skip to content

Add native DeepSeek-V3.2 support#44481

Open
XingyuHu109 wants to merge 14 commits intohuggingface:mainfrom
XingyuHu109:add-deepseek-v32-support
Open

Add native DeepSeek-V3.2 support#44481
XingyuHu109 wants to merge 14 commits intohuggingface:mainfrom
XingyuHu109:add-deepseek-v32-support

Conversation

@XingyuHu109
Copy link

@XingyuHu109 XingyuHu109 commented Mar 5, 2026

Summary

This PR adds native Transformers support for DeepSeek-V3.2.

It introduces a new deepseek_v32 model family so the official checkpoints resolve through the standard auto classes without trust_remote_code. The implementation keeps the DeepSeek-V3 MoE structure and plugs in the in-tree DSA attention/indexer path that V3.2 uses. Docs and a dedicated test module are included as well.

Closes #41196.

Validation

  • PYTHONPATH=src python -m pytest tests/models/deepseek_v32/test_modeling_deepseek_v32.py -q
    • 123 passed, 129 skipped
  • PYTHONPATH=src python -m ruff check src/transformers/models/deepseek_v32 tests/models/deepseek_v32
    • passed
  • official auto-class resolution with this branch:
    • AutoConfig.from_pretrained("deepseek-ai/DeepSeek-V3.2") -> DeepseekV32Config
    • AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V3.2", ...) -> DeepseekV32ForCausalLM
  • end-to-end native load and generation against the published deepseek-ai/DeepSeek-V3.2 checkpoint completed successfully
  • current GitHub checks are green

Note

The public tokenizer config does not currently ship a chat_template, so apply_chat_template(...) still needs an explicit template.

@XingyuHu109 XingyuHu109 force-pushed the add-deepseek-v32-support branch from adc22b1 to 03d1246 Compare March 5, 2026 23:49
@XingyuHu109 XingyuHu109 marked this pull request as ready for review March 6, 2026 01:09
Copilot AI review requested due to automatic review settings March 6, 2026 01:09
@XingyuHu109 XingyuHu109 changed the title [WIP] Add native DeepSeek-V3.2 support Add native DeepSeek-V3.2 support Mar 6, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds native Hugging Face Transformers support for the new DeepSeek-V3.2 architecture (deepseek_v32) so official checkpoints (e.g. deepseek-ai/DeepSeek-V3.2) resolve through standard auto-classes without requiring trust_remote_code.

Changes:

  • Introduces the deepseek_v32 model family (config + PyTorch modeling), registers it in auto-class mappings, and adds docs/tests.
  • Improves robustness of dynamic attention/experts-implementation detection (avoids KeyError when sys.modules lacks the model module entry).
  • Extends accelerate integration to support preload_module_classes dispatch (relevant for disk-offloaded MoE blocks) and fixes FP8 quantizer device-map validation precedence.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/transformers/models/deepseek_v32/modular_deepseek_v32.py Modular source defining DeepSeek-V3.2 config/model with DSA indexer + MoE.
src/transformers/models/deepseek_v32/configuration_deepseek_v32.py Generated config for deepseek_v32.
src/transformers/models/deepseek_v32/modeling_deepseek_v32.py Generated PyTorch implementation + generation support.
src/transformers/models/deepseek_v32/__init__.py Lazy import plumbing for new model package.
src/transformers/models/auto/configuration_auto.py Registers deepseek_v32 config in auto config mapping + display name.
src/transformers/models/auto/modeling_auto.py Registers deepseek_v32 model + causal LM in auto model mappings.
src/transformers/models/__init__.py Exposes the new model package under transformers.models.
tests/models/deepseek_v32/test_modeling_deepseek_v32.py New unit tests validating config fields, auto-class resolution, caching shapes, and disk-offloaded MoE behavior.
docs/source/en/model_doc/deepseek_v32.md New model documentation page + usage example.
docs/source/en/_toctree.yml Adds DeepSeek-V3.2 docs page to the docs navigation.
src/transformers/integrations/accelerate.py Adds preload_module_classes dispatch support with a compatibility shim for hook recursion.
src/transformers/modeling_utils.py Makes _can_set_attn_implementation / _can_set_experts_implementation robust to missing sys.modules entries.
src/transformers/quantizers/quantizer_finegrained_fp8.py Fixes device-map validation precedence and updates isinstance check style.
tests/quantization/finegrained_fp8/test_fp8.py Adds test ensuring pre-quantized FP8 models allow disk offload in device maps.
tests/utils/test_modeling_utils.py Adds regression test for missing sys.modules module-cache entry.
utils/check_config_attributes.py Allows V3.2 “metadata” fields not referenced by modeling code.
src/transformers/conversion_mapping.py / docs/source/en/weightconverter.md Adds deepseek_v32 to weight conversion pattern mapping.

Compatibility shim for `accelerate` releases that keep recursing into children after attaching a preloaded
parent block hook. When `preload_module_classes` is active, the parent hook already manages its submodules.
"""
if not isinstance(execution_device, Mapping) and not isinstance(offload, dict):
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _attach_align_device_hook_on_blocks_for_preload, the early-return branch checks not isinstance(offload, dict), but later the function treats offload as a generic Mapping. If accelerate ever passes a non-dict mapping (e.g. OrderedDict) this condition will be true and the code will incorrectly treat the mapping like a scalar boolean, potentially attaching the wrong hooks. Use Mapping consistently here (i.e. not isinstance(offload, Mapping)) so the scalar-vs-mapping logic is correct.

Suggested change
if not isinstance(execution_device, Mapping) and not isinstance(offload, dict):
if not isinstance(execution_device, Mapping) and not isinstance(offload, Mapping):

Copilot uses AI. Check for mistakes.
Comment on lines +744 to +745
>>> model = DeepseekV32ForCausalLM.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")
>>> tokenizer = AutoTokenizer.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring example references a model repo (meta-deepseek_v32/DeepseekV32-2-7b-hf) that does not match the official checkpoint this PR adds native support for (deepseek-ai/DeepSeek-V3.2). This will mislead users copying the snippet; please update the example to use the official model id (and matching tokenizer).

Suggested change
>>> model = DeepseekV32ForCausalLM.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")
>>> tokenizer = AutoTokenizer.from_pretrained("meta-deepseek_v32/DeepseekV32-2-7b-hf")
>>> model = DeepseekV32ForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V3.2")
>>> tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.2")

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +204
@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")
def test_left_padding_compatibility(self):
pass

@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")
def test_sdpa_padding_matches_padding_free_with_position_ids(self):
pass

@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several tests are skipped with reasons phrased as uncertainty (e.g. "Not sure MoE can pass this..."). Skip reasons should be specific and actionable (ideally referencing a known limitation or a tracking issue) so it’s clear whether this is an expected permanent limitation or a temporary gap to fix.

Suggested change
@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")
def test_left_padding_compatibility(self):
pass
@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")
def test_sdpa_padding_matches_padding_free_with_position_ids(self):
pass
@unittest.skip("Not sure MoE can pass this + indexer outputs are not deterministic wrt padding")
@unittest.skip("Skipped: MoE routing with the DSA indexer produces non-deterministic outputs with respect to padding, so left-padding compatibility cannot be reliably tested.")
def test_left_padding_compatibility(self):
pass
@unittest.skip("Skipped: MoE routing with the DSA indexer produces non-deterministic outputs with respect to padding, so SDPA vs padding-free behavior with position_ids cannot be reliably compared.")
def test_sdpa_padding_matches_padding_free_with_position_ids(self):
pass
@unittest.skip("Skipped: MoE routing with the DSA indexer produces non-deterministic outputs with respect to padding, making this overfitting test unreliable.")

Copilot uses AI. Check for mistakes.
@XingyuHu109
Copy link
Author

All checks are green now, and the DeepSeek-V3.2 native load/generation path has been validated on this branch. Would appreciate a review when someone working in this area has time.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_v32, finegrained_fp8

@Rocketknight1
Copy link
Member

cc @ArthurZucker!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Model] Support Deepseek-V3.2-Exp

3 participants