Skip to content

feat: add quantization support to benchmarking script#3

Merged
Lothnic merged 1 commit into
mainfrom
feature/benchmark-quant
Apr 27, 2026
Merged

feat: add quantization support to benchmarking script#3
Lothnic merged 1 commit into
mainfrom
feature/benchmark-quant

Conversation

@Lothnic

@Lothnic Lothnic commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features

    • Added optional --quantize/-q flag to benchmark tool for enabling 4-bit model quantization.
  • Documentation

    • Updated dequantization formula presentation in README for improved clarity.

@coderabbitai

coderabbitai Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces a quantize boolean flag to the benchmark entrypoint, enabling optional 4-bit NF4 quantization during model loading. The flag is exposed via CLI argument and propagated through the benchmark function. Additionally, an unused import is removed from the Qwen3 model module, and documentation formatting is simplified.

Changes

Cohort / File(s) Summary
Documentation Update
README.md
Dequantisation Formula presentation changed from LaTeX math block to inline code styling with spacing adjustment.
Benchmark Enhancement
benchmark.py
Added quantize: bool parameter to benchmark() function signature. Updated CLI parser with --quantize/-q argument and forwarded the flag to load_hf_model() for conditional model quantization.
Import Cleanup
models/qwen3.py
Removed unused Attention as LlamaAttention import from models.attention module.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • Lothnic/vllmini\Feature/quantisation #2: Adds matching --quantize/-q CLI argument and quantize: bool parameter to the benchmark function for conditional model quantization.

Poem

🐰 A flag to quantize, a choice so neat,
Compress the weights, make models lean and sweet,
Through CLI pipes the option flows,
Four bits of wisdom, efficiency grows! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add quantization support to benchmarking script' accurately describes the main change in the pull request—adding a quantize flag to the benchmark.py script with conditional 4-bit NF4 quantization support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/benchmark-quant

Comment @coderabbitai help to get the list of available commands and usage tips.

@Lothnic Lothnic merged commit cf87818 into main Apr 27, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant