Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR extends the auto_round.data_type.nvfp quantization utilities by introducing a new FP4 quantization variant (fp4_v3) and its reference implementation, alongside a small cleanup of unused logging import.
Changes:
- Remove unused
loggerimport fromauto_round/data_type/nvfp.py. - Add
ref_fp4_quant_v3reference FP4 quantization routine. - Register a new quantization dtype entry point:
@register_dtype("fp4_v3").
You can also share your feedback on Copilot code review. Take the survey.
| @register_dtype("fp4_v3") | ||
| def fp4_v3(tensor, bits=4, group_size=32, v=0, max_scale=1.0, **kwargs): | ||
| orig_dtype = tensor.dtype | ||
| tensor, orig_shape, pad_len = reshape_pad_tensor_by_group_size(tensor, group_size) | ||
| global_scale = 1.0 | ||
| qdq_res, scale = ref_fp4_quant_v3(tensor, global_scale, group_size, v, max_scale) |
There was a problem hiding this comment.
fp4_v3 doesn’t validate group_size, while fp4_v2/fp4_v2_with_global_scale explicitly restrict it to 16 or 32. If fp4_v3 has the same constraints, add the same assertion (or otherwise handle/describe supported values) to avoid silently producing unexpected scaling for unsupported group sizes.
|
|
||
| @register_dtype("fp4_v3") | ||
| def fp4_v3(tensor, bits=4, group_size=32, v=0, max_scale=1.0, **kwargs): | ||
| orig_dtype = tensor.dtype | ||
| tensor, orig_shape, pad_len = reshape_pad_tensor_by_group_size(tensor, group_size) | ||
| global_scale = 1.0 | ||
| qdq_res, scale = ref_fp4_quant_v3(tensor, global_scale, group_size, v, max_scale) | ||
| qdq_res = revert_tensor_by_pad(qdq_res, orig_shape=orig_shape, pad_len=pad_len) |
There was a problem hiding this comment.
New registered dtype fp4_v3/ref_fp4_quant_v3 adds a new quantization path but there are no corresponding unit tests covering it. Please add a small CPU test that exercises get_quant_func('fp4_v3', ...) (or QUANT_FUNC_WITH_DTYPE['fp4_v3']) and verifies output shape matches input, scale shape matches the number of groups, and results stay within the intended FP4 range after quant-dequant.
|
|
||
| @register_dtype("fp4_v3") | ||
| def fp4_v3(tensor, bits=4, group_size=32, v=0, max_scale=1.0, **kwargs): | ||
| orig_dtype = tensor.dtype | ||
| tensor, orig_shape, pad_len = reshape_pad_tensor_by_group_size(tensor, group_size) | ||
| global_scale = 1.0 |
There was a problem hiding this comment.
The PR title/description are currently the default template and don’t explain what fp4_v3 is intended to change vs fp4_v2 (e.g., why scale uses bf16 and why UE5M3 clipping/casting is removed). Please update the PR description to document the motivation and expected usage so reviewers can validate correctness.
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting