Skip to content

Enable low_cpu_mem_usage for mxfp/nvfp#1648

Open
Kaihui-intel wants to merge 1 commit intomainfrom
kaihui/low_cpu_mem_usage
Open

Enable low_cpu_mem_usage for mxfp/nvfp#1648
Kaihui-intel wants to merge 1 commit intomainfrom
kaihui/low_cpu_mem_usage

Conversation

@Kaihui-intel
Copy link
Copy Markdown
Contributor

@Kaihui-intel Kaihui-intel commented Apr 2, 2026

Description

#1127

Qwen3-32B nvfp4 enable/disable low_cpu_mem_usage
'peak_ram': 10.35GB, 'peak_vram': 60.75GB
'peak_ram': 30.35GB, 'peak_vram': 60.75GB

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Copilot AI review requested due to automatic review settings April 2, 2026 08:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes a guard that previously disabled immediate_saving for non-integer quantization types, likely to allow memory-saving flows for additional quantization formats.

Changes:

  • Removed the runtime check that forced immediate_saving=False when data_type wasn’t integer-based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants