Skip to content

[Bugfix] [Offloading] Save disk-offloaded buffers, Save converted weights#46902

Closed
kylesayrs wants to merge 10 commits into
huggingface:mainfrom
kylesayrs:kylesayrs/fix-disk-offloaded-ptr-buffer
Closed

[Bugfix] [Offloading] Save disk-offloaded buffers, Save converted weights#46902
kylesayrs wants to merge 10 commits into
huggingface:mainfrom
kylesayrs:kylesayrs/fix-disk-offloaded-ptr-buffer

Conversation

@kylesayrs

@kylesayrs kylesayrs commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

CI

Purpose

  • Parlay two bugfixes in one for the purpose of enabling the saving of models with disk offloaded weights
    • Fix saving models which are disk offloaded with offload_buffers=True
    • Fix saving models which are disk offloaded with weight conversions

Changes

  • Replace get_parameter call with get_parameter_or_buffer call
  • Expand load_offloaded_parameter to load a state dict of all weights associated with the checkpoint weight
    • checkpoint weight (all/any) -> model weight (one) -> reverted weights (all)
    • By using this loaded state dict to update the original state dict, we avoid redundant loading of offloaded weights
  • Loosen requirements for is_offloaded flag
    • This is backwards compatible safe, as full disk offloaded was never supported in previous releases anyways

Testing

Able to save disk-offloaded models with conversion mappings now

from transformers import AutoModelForCausalLM

# Load model with full disk offload
model = AutoModelForCausalLM.from_pretrained(
    "inference-optimization/DSV4-tiny-empty",
    device_map="auto",
    max_memory={},
    offload_folder="offload_folder",
)

# Save the model
model.save_pretrained("tmp_save")

Used these changes to quantize RedHatAI/DeepSeek-V4-Pro-NVFP4-FP8 and RedHatAI/GLM-5.2-NVFP4-FP8

Suggested Reviewers

@kylesayrs kylesayrs changed the title [Bugfix] [Bugfix] Save disk-offloaded buffers Jun 26, 2026
@kylesayrs kylesayrs changed the title [Bugfix] Save disk-offloaded buffers [Bugfix] [Offloading] Save disk-offloaded buffers, Save converted weights Jun 26, 2026
@kylesayrs kylesayrs force-pushed the kylesayrs/fix-disk-offloaded-ptr-buffer branch 2 times, most recently from fbe41de to 54dfbf8 Compare June 29, 2026 17:58
@kylesayrs kylesayrs marked this pull request as ready for review June 29, 2026 17:59

@SunMarc SunMarc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a couple of comments but I think @Cyrilvallez might have better ideas on how to deal with those as he's the one who coded this !

filename = os.path.join(save_directory, shard_file)
shard_state_dict = {}
for tensor_name in tensor_names:
for tensor_name in sorted(tensor_names):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason to sort ?

@kylesayrs kylesayrs Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that load_offloaded_parameter may load multiple weights for a single tensor.
While it is possible to overload CPU memory by loading parameters in a bad order,
in practice split_torch_state_dict_into_shards preserves weight locality

sorting helps reduce the chances that bad load ordering occurs.

An example of bad load ordering would be

"layers.0.experts.0.up_proj" -> loads "layers.0.experts.gate_up_proj"
"layers.1.experts.0.up_proj" -> loads "layers.1.experts.gate_up_proj"
"layers.2.experts.0.up_proj" -> loads "layers.2.experts.gate_up_proj"

In this scenario, 3 separate gate_up_proj weights have been loaded onto cpu, but only 3 shard weights have been consumed by state_dict.pop.

Sorting reduces the chances that split_torch_state_dict_into_shards gives an adversarially bad ordering. It doesn't fix adversarially bad ordering between shards, but there's not much we can do about that.

Comment thread src/transformers/integrations/accelerate.py
Comment thread src/transformers/integrations/accelerate.py
@SunMarc SunMarc requested a review from Cyrilvallez June 30, 2026 14:29
@kylesayrs kylesayrs force-pushed the kylesayrs/fix-disk-offloaded-ptr-buffer branch from f8b104d to 41e9caf Compare July 1, 2026 15:49
kylesayrs added 10 commits July 2, 2026 00:45
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/fix-disk-offloaded-ptr-buffer branch from 7d663f8 to dbc8c39 Compare July 2, 2026 04:45
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

CI recap

Dashboard: View test results in Grafana
Latest run: 28566046728:2
Result: failure | Jobs: 14 | Tests: 72,772 | Failures: 1 | Duration: 16h 25m

@Cyrilvallez

Copy link
Copy Markdown
Member

Hey @kylesayrs! I took the liberty to open #47018 to fix the issue, I believe it is simpler and more robust in general!
Let me know if something is still unclear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants