Skip to content

[N4Landing]update#1538

Open
wenhuach21 wants to merge 2 commits intomainfrom
fp4_v3
Open

[N4Landing]update#1538
wenhuach21 wants to merge 2 commits intomainfrom
fp4_v3

Conversation

@wenhuach21
Copy link
Copy Markdown
Contributor

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Copilot AI review requested due to automatic review settings March 12, 2026 09:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the auto_round.data_type.nvfp quantization utilities by introducing a new FP4 quantization variant (fp4_v3) and its reference implementation, alongside a small cleanup of unused logging import.

Changes:

  • Remove unused logger import from auto_round/data_type/nvfp.py.
  • Add ref_fp4_quant_v3 reference FP4 quantization routine.
  • Register a new quantization dtype entry point: @register_dtype("fp4_v3").

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +251 to +256
@register_dtype("fp4_v3")
def fp4_v3(tensor, bits=4, group_size=32, v=0, max_scale=1.0, **kwargs):
orig_dtype = tensor.dtype
tensor, orig_shape, pad_len = reshape_pad_tensor_by_group_size(tensor, group_size)
global_scale = 1.0
qdq_res, scale = ref_fp4_quant_v3(tensor, global_scale, group_size, v, max_scale)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fp4_v3 doesn’t validate group_size, while fp4_v2/fp4_v2_with_global_scale explicitly restrict it to 16 or 32. If fp4_v3 has the same constraints, add the same assertion (or otherwise handle/describe supported values) to avoid silently producing unexpected scaling for unsupported group sizes.

Copilot uses AI. Check for mistakes.
Comment on lines 250 to +257

@register_dtype("fp4_v3")
def fp4_v3(tensor, bits=4, group_size=32, v=0, max_scale=1.0, **kwargs):
orig_dtype = tensor.dtype
tensor, orig_shape, pad_len = reshape_pad_tensor_by_group_size(tensor, group_size)
global_scale = 1.0
qdq_res, scale = ref_fp4_quant_v3(tensor, global_scale, group_size, v, max_scale)
qdq_res = revert_tensor_by_pad(qdq_res, orig_shape=orig_shape, pad_len=pad_len)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New registered dtype fp4_v3/ref_fp4_quant_v3 adds a new quantization path but there are no corresponding unit tests covering it. Please add a small CPU test that exercises get_quant_func('fp4_v3', ...) (or QUANT_FUNC_WITH_DTYPE['fp4_v3']) and verifies output shape matches input, scale shape matches the number of groups, and results stay within the intended FP4 range after quant-dequant.

Copilot uses AI. Check for mistakes.
Comment on lines 250 to +255

@register_dtype("fp4_v3")
def fp4_v3(tensor, bits=4, group_size=32, v=0, max_scale=1.0, **kwargs):
orig_dtype = tensor.dtype
tensor, orig_shape, pad_len = reshape_pad_tensor_by_group_size(tensor, group_size)
global_scale = 1.0
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description are currently the default template and don’t explain what fp4_v3 is intended to change vs fp4_v2 (e.g., why scale uses bf16 and why UE5M3 clipping/casting is removed). Please update the PR description to document the motivation and expected usage so reviewers can validate correctness.

Copilot uses AI. Check for mistakes.
@wenhuach21 wenhuach21 changed the title [N4Landing][update [N4Landing]update Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants