Conversation
There was a problem hiding this comment.
Pull request overview
Refactors AutoRound toward a new “context + compressor + algorithm” architecture, introducing new compressors_new/ and context/ modules and updating scheme parsing/export helpers to support the new flow.
Changes:
- Added new context singletons (
ModelContext,CompressContext) and a newcompressors_newimplementation path. - Expanded scheme parsing to reconcile
bits/data_typeand support user overrides + AutoScheme integration. - Added new calibration utilities and algorithm scaffolding for quantization backends (AutoRound/RTN).
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| auto_round/utils/model.py | Avoids runtime import cycles via TYPE_CHECKING for QuantizationScheme. |
| auto_round/schemes.py | Adds scheme override + parsing helpers and bits/dtype reconciliation. |
| auto_round/formats.py | Switches divisibility checks to global supported-layer constants. |
| auto_round/context/model_context.py | Introduces model lifecycle/loading + AMP setup and forward-hook management. |
| auto_round/context/compress_context.py | Introduces device/device_map and memory-usage knobs as shared context. |
| auto_round/context/base.py | Adds simple singleton context base. |
| auto_round/context/init.py | Package init for new context module. |
| auto_round/compressors_new/utils.py | New utility module (layer config, gguf mapping, caching helpers, forward helpers). |
| auto_round/compressors_new/shard_writer.py | New shard-based saver with optional safetensors support. |
| auto_round/compressors_new/config.py | Introduces extra/legacy config dataclasses for the new compressor path. |
| auto_round/compressors_new/base.py | New “BaseCompressor” implementation wiring contexts, formats, caching, quant loop. |
| auto_round/compressors_new/init.py | Package init for compressors_new. |
| auto_round/compressors/utils.py | Extends legacy layer-config resolution to include safetensors-only tensors and skip missing modules. |
| auto_round/calibration/utils.py | Adds helpers for “early stop” caching and input reshaping for block tuning. |
| auto_round/calibration/init.py | Package init for calibration. |
| auto_round/algorithms/quantization/rtn/rtn.py | Adds placeholder RTN quantization module file. |
| auto_round/algorithms/quantization/rtn/config.py | Adds RTN algorithm config stub. |
| auto_round/algorithms/quantization/rtn/init.py | Package init for RTN quantization. |
| auto_round/algorithms/quantization/base.py | Adds base quantization class stub. |
| auto_round/algorithms/quantization/auto_round/quantize.py | Adds new AutoRound quantizer implementation (algorithm object). |
| auto_round/algorithms/quantization/auto_round/config.py | Adds new AutoRound algorithm config. |
| auto_round/algorithms/quantization/auto_round/init.py | Package init for AutoRound quantization algorithm. |
| auto_round/algorithms/quantization/init.py | Package init for quantization algorithms. |
| auto_round/algorithms/base.py | Adds base algorithm stub. |
| auto_round/algorithms/alg_config.py | Adds base algorithm config stub. |
| auto_round/algorithms/init.py | Package init for algorithms. |
|
If there is already an algorithm folder, what is the purpose of the compressor folder? |
…uo/new_ar_arch
…uo/new_ar_arch
| import torch | ||
|
|
||
|
|
||
| class ExtraConfig: |
There was a problem hiding this comment.
ExtraConfig is a monolithic catch-all config class.
ExtraConfig bundles tuning, scheme, MLLM, and diffusion settings into a single class — the opposite of llm-compressor's approach where each modifier owns its own typed config. This "one object owns everything" pattern makes it harder to add new algorithms independently and is a carryover from the old monolithic design rather than a step toward the intended modular architecture.
There was a problem hiding this comment.
Despite this PR's goal of separating concerns into Context/Algorithm/Compressor, BaseCompressor still owns everything: config parsing, calibration data collection, forward hook management, quantization loop control, and model saving. By contrast, llm-compressor distributes these responsibilities across dedicated Pipeline (calibration), Modifier (algorithm logic), Session (lifecycle orchestration), and entrypoint (API) layers. The refactor restructures the file layout without achieving real decoupling.
…uo/new_ar_arch
Signed-off-by: n1ck-guo <heng.guo@intel.com>
for more information, see https://pre-commit.ci
auto_round/compressors_new/base.py
Outdated
| **kwargs, | ||
| ): | ||
| self.quantize_config = None | ||
| self.rotation_configs: list[BaseRotationConfig] = [] |
There was a problem hiding this comment.
It is not supported to use two or more rotation configs sequentially on the same model. we can support laywise rotation configs (this is not related to this pr). So, in this line, I think we just support one rotation config here
@n1ck-guo For the new API usage, would it be better to determine the order of applying configs based on the order in the config list? |
Yes, you’re right. We should preserve the algorithm order as provided by the user unless there are technical limitations. However, for unsupported or suboptimal orders, such as applying AR before Hadamard, we should log a warning and provide a recommended order. |
|
To better separate the interface from the algorithm, I suggest renaming the algorithm from AutoRound to SignRound, and updating AutoRoundConfig to SignRoundConfig. In addition, the API should accept either a string or an enum, e.g., alg_configs=["SignRound","Hadamard"]. |
add it to TODO, It will be implemented in a later PR. |
If we use strings, how should the relevant configurations be passed? Should we use the defaults? |
Yes, just like scheme, which supports str, dict, and dataclass. In this case, supporting str helps simplify usage, while advanced users can use a class-based configuration as shown in your example. |
Signed-off-by: n1ck-guo <heng.guo@intel.com>
…uo/new_ar_arch
| *, | ||
| iters: int = 200, | ||
| lr: float = None, | ||
| minmax_lr: float = None, | ||
| lr_scheduler=None, | ||
| seqlen: int = 2048, | ||
| nsamples: int = 128, |
There was a problem hiding this comment.
One todo
If different algorithms require different nsamples or seqlen, and the pipeline runs them sequentially for each block, the calibration process should use the maximum required values and truncate the data accordingly for each algorithm.
| from auto_round.algorithms.rotation.hadamard.config import HadamardConfig, normalize_hadamard_config | ||
| from auto_round.algorithms.rotation.hadamard.transforms import build_hadamard_transform | ||
| from auto_round.algorithms.transforms.base import BaseRotation | ||
| from auto_round.algorithms.transforms.hadamard.config import HadamardConfig, normalize_hadamard_config |
There was a problem hiding this comment.
I’d prefer using rotation, but it’s up to you.
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
test/__init__.pyc
Outdated
There was a problem hiding this comment.
why .pyc file uploaded here?
Signed-off-by: n1ck-guo <heng.guo@intel.com>
| self.immediate_packing = is_immediate_packing | ||
| self.is_immediate_packing = is_immediate_packing | ||
| self.is_immediate_saving = is_immediate_saving | ||
| self.formats = formats |
There was a problem hiding this comment.
two times init self.formats
| return scheme.dataset | ||
| return "NeelNanda/pile-10k" | ||
|
|
||
| def post_init(self) -> None: |
There was a problem hiding this comment.
better to extract the five phases into standalone pipeline methods with explicit inputs/outputs. Each phase should become its own method with clearly documented preconditions, postconditions, and return values, so that post_init() reduces to a thin sequential orchestrator calling them in order.
| """Generate per-layer config via AutoScheme delta-loss selection.""" | ||
| if self.model_context.is_mllm: | ||
| logger.info("AutoScheme is not yet supported for multimodal LLMs.") | ||
| sys.exit(-1) |
There was a problem hiding this comment.
Better to raise runtime error instead of exit().
|
|
||
| pass | ||
|
|
||
| return DiffusionZeroShotCompressor(alg_configs, **local_args, **kwargs) |
There was a problem hiding this comment.
Better to move the 8 dynamically-created empty Mixin classes to module-level definitions, or replace them with a registry-based dispatch.
Description
Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.
Usage of new api:
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting