Fix save_pretrained with offloading and weight conversions by Cyrilvallez · Pull Request #47018 · huggingface/transformers

Cyrilvallez · 2026-07-02T08:14:14Z

What does this PR do?

As per the title. Currently, since offloaded weights are on meta device and reverse conversions happens BEFORE we reload them from disk, the resulting converted tensors are on meta as well, and then we will usually fail when trying to load them back from the model, since they changed name.
Since we cannot load everything back to cpu at the beginning with offloading (we have a constrained cpu environment usually, otherwise we would not offload), the best is to skip conversion at the beginning, load all params of a given file shard to cpu, and reverse-convert only this shard before starting the next. This is however not fullproof for one-weight-to-many conversions, because the reverse is many-weight-to-one, meaning we need several weights to be able to perform the conversion, and they may not live all in the same shard if we are unlucky with how the shards were created. In those cases, we raise a nice and clear error, as we cannot do better without probably overloading the memory compl,etely anyway.

Supersedes #46902

HuggingFaceDocBuilderDev · 2026-07-02T08:30:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-07-02T10:04:39Z

CI recap

Dashboard: View test results in Grafana
Latest run: 28581147185:1
Result: success | Jobs: 13 | Tests: 46,177 | Failures: 0 | Duration: 16h 9m

kylesayrs

Looks good to me! I'll test this with some of my large moe models later today.

I think the one difference I can find between this and #46902 is that

#46902 handles many-to-one reversion optimally without erroring, but suffers increased cpu memory usage for unlucky sorting + one-to-many reversion (but afaict this can be completely negated)
(this PR) handles many-to-one reversion with an error if unlucky, but handles one-to-many reversion optimally without erroring

My only caution is that, when many-to-one reversions do happen, the unlucky case of splitting across shards can be common, even when it's just (2/3-to-1). When compressed-tensors needed similar logic to fuse fused weights (3-to-1) and dequantization (2-to-1), we found that almost half of the shards ended up splitting these params.

By contrast, unlucky sorting + one-to-many reversion can essentially be avoided if the state_dict is sorted before split_torch_state_dict_into_shards.

So I do think that we should try to figure out when many-to-one conversions happen and whether this is going to block those kinds of models from being saved. If HF starts using many-to-one conversions more heavily, then this might have to change.

EDIT: I thought of one such use case where downstream user LLM Compressor will want to support reverting linearized MoE weights into packed 3D weights in order to match the original checkpoint's format for models like https://huggingface.co/collections/Qwen/qwen3-vl.

kylesayrs · 2026-07-02T14:55:43Z

            is_offloaded = True
            warnings.warn(
                "Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory "
                "exceeds the `shard_size` (50GB default)"


Suggested change

"exceeds the `shard_size` (50GB default) and/or the largest model weight size (this can "

"be very large for MoE models with fused experts)."

kylesayrs · 2026-07-02T14:56:15Z

        if (
            hasattr(self, "hf_device_map")
            and len(set(self.hf_device_map.values())) > 1
            and ("cpu" in self.hf_device_map.values() or "disk" in self.hf_device_map.values())
        ):


Support full disk offloading

Suggested change

if hasattr(self, "hf_device_map") and (

len(set(self.hf_device_map.values())) > 1 or "disk" in self.hf_device_map.values()

):

Cyrilvallez added 2 commits July 2, 2026 17:07

fix save

a09016d

fix

cf31d74

Cyrilvallez added 3 commits July 2, 2026 18:35

fix

78377b4

ignore stupid ty

4b0bc7a

add buffers

1ee8e25

Cyrilvallez mentioned this pull request Jul 2, 2026

[Bugfix] [Offloading] Save disk-offloaded buffers, Save converted weights #46902

Closed

kylesayrs approved these changes Jul 2, 2026

View reviewed changes

kylesayrs reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix save_pretrained with offloading and weight conversions#47018

Fix save_pretrained with offloading and weight conversions#47018
Cyrilvallez wants to merge 5 commits into
mainfrom
fix-save-offloading

Cyrilvallez commented Jul 2, 2026 •

edited by github-actions Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

kylesayrs left a comment •

edited

Loading

Uh oh!

kylesayrs Jul 2, 2026

Uh oh!

kylesayrs Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


	"exceeds the `shard_size` (50GB default) and/or the largest model weight size (this can "
	"be very large for MoE models with fused experts)."

+if hasattr(self, "hf_device_map") and (
+            len(set(self.hf_device_map.values())) > 1 or "disk" in self.hf_device_map.values()
+        ):

Uh oh!

Conversation

Cyrilvallez commented Jul 2, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

CI recap

Uh oh!

kylesayrs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylesayrs Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

kylesayrs Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cyrilvallez commented Jul 2, 2026 •

edited by github-actions Bot

Loading

kylesayrs left a comment •

edited

Loading