new architecture for auto_round by n1ck-guo · Pull Request #1542 · intel/auto-round

n1ck-guo · 2026-03-13T02:08:50Z

Description

Compressor:
Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.
Calibration: Handles the calibration process (Work in Progress)
Context: Manages shared configurations and model states throughout the quantization pipeline, providing centralized control to prevent cross-module dependencies
- ModelContext: Handles model loading and tracks model states and relevant configurations
- CompressContext: Stores shared compression settings such as low_cpu_mem_usage, enable_torch_compile, etc.
Algorithms: Concrete quantization and weight transformation implementations
- Quantization: Various quantization algorithms, including AutoRound, RTN, OptRTN, etc.
- Transform: Weight transformation algorithms such as Hadamard transform

Usage of new api:

from auto_round.algorithms.rotation import HadamardConfig 

quant_cfg  = AutoRoundConfig(bits=4, group_size=128, iters=200)
had_cfg_1  = HadamardConfig(hadamard_type="hadamard",        block_size=32)
had_cfg_2  = HadamardConfig(hadamard_type="random_hadamard", block_size=64, random_seed=True)

compressor = Compressor(
    config=[quant_cfg, had_cfg_1, had_cfg_2], 
    model="facebook/opt-125m",
    scheme="MXFP4",
    format="auto_round",
)

model, layer_config = compressor.quantize_and_save(
    output_dir="./output",
)

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot

Pull request overview

Refactors AutoRound toward a new “context + compressor + algorithm” architecture, introducing new compressors_new/ and context/ modules and updating scheme parsing/export helpers to support the new flow.

Changes:

Added new context singletons (ModelContext, CompressContext) and a new compressors_new implementation path.
Expanded scheme parsing to reconcile bits/data_type and support user overrides + AutoScheme integration.
Added new calibration utilities and algorithm scaffolding for quantization backends (AutoRound/RTN).

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
auto_round/utils/model.py	Avoids runtime import cycles via `TYPE_CHECKING` for `QuantizationScheme`.
auto_round/schemes.py	Adds scheme override + parsing helpers and bits/dtype reconciliation.
auto_round/formats.py	Switches divisibility checks to global supported-layer constants.
auto_round/context/model_context.py	Introduces model lifecycle/loading + AMP setup and forward-hook management.
auto_round/context/compress_context.py	Introduces device/device_map and memory-usage knobs as shared context.
auto_round/context/base.py	Adds simple singleton context base.
auto_round/context/init.py	Package init for new `context` module.
auto_round/compressors_new/utils.py	New utility module (layer config, gguf mapping, caching helpers, forward helpers).
auto_round/compressors_new/shard_writer.py	New shard-based saver with optional safetensors support.
auto_round/compressors_new/config.py	Introduces extra/legacy config dataclasses for the new compressor path.
auto_round/compressors_new/base.py	New “BaseCompressor” implementation wiring contexts, formats, caching, quant loop.
auto_round/compressors_new/init.py	Package init for `compressors_new`.
auto_round/compressors/utils.py	Extends legacy layer-config resolution to include safetensors-only tensors and skip missing modules.
auto_round/calibration/utils.py	Adds helpers for “early stop” caching and input reshaping for block tuning.
auto_round/calibration/init.py	Package init for `calibration`.
auto_round/algorithms/quantization/rtn/rtn.py	Adds placeholder RTN quantization module file.
auto_round/algorithms/quantization/rtn/config.py	Adds RTN algorithm config stub.
auto_round/algorithms/quantization/rtn/init.py	Package init for RTN quantization.
auto_round/algorithms/quantization/base.py	Adds base quantization class stub.
auto_round/algorithms/quantization/auto_round/quantize.py	Adds new AutoRound quantizer implementation (algorithm object).
auto_round/algorithms/quantization/auto_round/config.py	Adds new AutoRound algorithm config.
auto_round/algorithms/quantization/auto_round/init.py	Package init for AutoRound quantization algorithm.
auto_round/algorithms/quantization/init.py	Package init for quantization algorithms.
auto_round/algorithms/base.py	Adds base algorithm stub.
auto_round/algorithms/alg_config.py	Adds base algorithm config stub.
auto_round/algorithms/init.py	Package init for algorithms.

auto_round/compressors_new/utils.py

auto_round/compressors_new/base.py

auto_round/compressors_new/shard_writer.py

auto_round/algorithms/quantization/base.py

auto_round/context/model_context.py

auto_round/algorithms/quantization/auto_round/quantize.py

auto_round/algorithms/quantization/auto_round/config.py

auto_round/context/model.py

auto_round/schemes.py

auto_round/algorithms/quantization/auto_round/quantize.py

wenhuach21 · 2026-03-13T02:16:59Z

If there is already an algorithm folder, what is the purpose of the compressor folder?

auto_round/compressors_new/base.py

…uo/new_ar_arch

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…uo/new_ar_arch

auto_round/compressors_new/base.py

auto_round/algorithms/quantization/auto_round/quantize.py

auto_round/algorithms/alg_config.py

lvliang-intel · 2026-03-16T03:30:56Z

auto_round/compressors_new/config.py

+import torch
+
+
+class ExtraConfig:


ExtraConfig is a monolithic catch-all config class.
ExtraConfig bundles tuning, scheme, MLLM, and diffusion settings into a single class — the opposite of llm-compressor's approach where each modifier owns its own typed config. This "one object owns everything" pattern makes it harder to add new algorithms independently and is a carryover from the old monolithic design rather than a step toward the intended modular architecture.

lvliang-intel · 2026-03-16T03:34:06Z

auto_round/algorithms/quantization/auto_round/quantize.py

Despite this PR's goal of separating concerns into Context/Algorithm/Compressor, BaseCompressor still owns everything: config parsing, calibration data collection, forward hook management, quantization loop control, and model saving. By contrast, llm-compressor distributes these responsibilities across dedicated Pipeline (calibration), Modifier (algorithm logic), Session (lifecycle orchestration), and entrypoint (API) layers. The refactor restructures the file layout without achieving real decoupling.

…uo/new_ar_arch

Signed-off-by: n1ck-guo <heng.guo@intel.com>

for more information, see https://pre-commit.ci

lkk12014402 · 2026-03-31T08:52:31Z

auto_round/compressors_new/base.py

+        **kwargs,
+    ):
+        self.quantize_config = None
+        self.rotation_configs: list[BaseRotationConfig] = []


It is not supported to use two or more rotation configs sequentially on the same model. we can support laywise rotation configs (this is not related to this pr). So, in this line, I think we just support one rotation config here

auto_round/algorithms/quantization/auto_round/config.py

lkk12014402 · 2026-03-31T09:06:33Z

Description

Compressor:
Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.

Calibration: Handles the calibration process (Work in Progress)

Context: Manages shared configurations and model states throughout the quantization pipeline, providing centralized control to prevent cross-module dependencies

ModelContext: Handles model loading and tracks model states and relevant configurations

CompressContext: Stores shared compression settings such as low_cpu_mem_usage, enable_torch_compile, etc.

Algorithms: Concrete quantization and weight transformation implementations

Quantization: Various quantization algorithms, including AutoRound, RTN, OptRTN, etc.

Transform: Weight transformation algorithms such as Hadamard transform

Usage of new api:
from auto_round.algorithms.rotation import HadamardConfig 

quant_cfg  = AutoRoundConfig(bits=4, group_size=128, iters=200)
had_cfg_1  = HadamardConfig(hadamard_type="hadamard",        block_size=32)
had_cfg_2  = HadamardConfig(hadamard_type="random_hadamard", block_size=64, random_seed=True)

compressor = Compressor(
    config=[quant_cfg, had_cfg_1, had_cfg_2], 
    model="facebook/opt-125m",
    scheme="MXFP4",
    format="auto_round",
)

model, layer_config = compressor.quantize_and_save(
    output_dir="./output",
)
Type of Change

Bug fix

New feature

Documentation update

Performance improvement

Code refactoring

Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.

Documentation has been updated as needed.

New or updated tests are included where applicable.

@n1ck-guo For the new API usage, would it be better to determine the order of applying configs based on the order in the config list?
If so, the rotation config probably shouldn’t be applied inside Compressor __init__; instead, all configs should be applied through a loop, like https://github.com/vllm-project/llm-compressor/blob/main/examples/transform/spinquant_example.py#L19 and http://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/pipelines/independent/pipeline.py#L38 .

wenhuach21 · 2026-03-31T09:09:40Z

Description

Compressor:
Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.

Calibration: Handles the calibration process (Work in Progress)

Context: Manages shared configurations and model states throughout the quantization pipeline, providing centralized control to prevent cross-module dependencies

ModelContext: Handles model loading and tracks model states and relevant configurations

CompressContext: Stores shared compression settings such as low_cpu_mem_usage, enable_torch_compile, etc.

Algorithms: Concrete quantization and weight transformation implementations

Quantization: Various quantization algorithms, including AutoRound, RTN, OptRTN, etc.

Transform: Weight transformation algorithms such as Hadamard transform

Usage of new api:
from auto_round.algorithms.rotation import HadamardConfig 

quant_cfg  = AutoRoundConfig(bits=4, group_size=128, iters=200)
had_cfg_1  = HadamardConfig(hadamard_type="hadamard",        block_size=32)
had_cfg_2  = HadamardConfig(hadamard_type="random_hadamard", block_size=64, random_seed=True)

compressor = Compressor(
    config=[quant_cfg, had_cfg_1, had_cfg_2], 
    model="facebook/opt-125m",
    scheme="MXFP4",
    format="auto_round",
)

model, layer_config = compressor.quantize_and_save(
    output_dir="./output",
)
Type of Change

Bug fix

New feature

Documentation update

Performance improvement

Code refactoring

Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.

Documentation has been updated as needed.

New or updated tests are included where applicable.
@n1ck-guo For the new API usage, would it be better to determine the order of applying configs based on the order in the config list? If so, the rotation config probably shouldn’t be applied inside init; instead, all configs should be applied through a loop, like https://github.com/vllm-project/llm-compressor/blob/main/examples/transform/spinquant_example.py#L19 and http://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/pipelines/independent/pipeline.py#L38

Yes, you’re right. We should preserve the algorithm order as provided by the user unless there are technical limitations. However, for unsupported or suboptimal orders, such as applying AR before Hadamard, we should log a warning and provide a recommended order.

wenhuach21 · 2026-04-01T01:36:59Z

To better separate the interface from the algorithm, I suggest renaming the algorithm from AutoRound to SignRound, and updating AutoRoundConfig to SignRoundConfig. In addition, the API should accept either a string or an enum, e.g., alg_configs=["SignRound","Hadamard"].

n1ck-guo · 2026-04-01T02:04:37Z

Description

Compressor:
Main entry point responsible for orchestrating the workflow, invoking different algorithms, and handling model persistence. Supports block-wise or layer-wise quantization strategies. Primary subclasses include TuneCompressor and ZeroShotCompressor.

Calibration: Handles the calibration process (Work in Progress)

Context: Manages shared configurations and model states throughout the quantization pipeline, providing centralized control to prevent cross-module dependencies

ModelContext: Handles model loading and tracks model states and relevant configurations

CompressContext: Stores shared compression settings such as low_cpu_mem_usage, enable_torch_compile, etc.

Algorithms: Concrete quantization and weight transformation implementations

Quantization: Various quantization algorithms, including AutoRound, RTN, OptRTN, etc.

Transform: Weight transformation algorithms such as Hadamard transform

Usage of new api:
from auto_round.algorithms.rotation import HadamardConfig 

quant_cfg  = AutoRoundConfig(bits=4, group_size=128, iters=200)
had_cfg_1  = HadamardConfig(hadamard_type="hadamard",        block_size=32)
had_cfg_2  = HadamardConfig(hadamard_type="random_hadamard", block_size=64, random_seed=True)

compressor = Compressor(
    config=[quant_cfg, had_cfg_1, had_cfg_2], 
    model="facebook/opt-125m",
    scheme="MXFP4",
    format="auto_round",
)

model, layer_config = compressor.quantize_and_save(
    output_dir="./output",
)
Type of Change

Bug fix

New feature

Documentation update

Performance improvement

Code refactoring

Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.

Documentation has been updated as needed.

New or updated tests are included where applicable.
@n1ck-guo For the new API usage, would it be better to determine the order of applying configs based on the order in the config list? If so, the rotation config probably shouldn’t be applied inside Compressor __init__; instead, all configs should be applied through a loop, like https://github.com/vllm-project/llm-compressor/blob/main/examples/transform/spinquant_example.py#L19 and http://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/pipelines/independent/pipeline.py#L38 .

add it to TODO, It will be implemented in a later PR.

n1ck-guo · 2026-04-01T02:06:52Z

To better separate the interface from the algorithm, I suggest renaming the algorithm from AutoRound to SignRound, and updating AutoRoundConfig to SignRoundConfig. In addition, the API should accept either a string or an enum, e.g., alg_configs=["SignRound","Hadamard"].

If we use strings, how should the relevant configurations be passed? Should we use the defaults?

wenhuach21 · 2026-04-01T02:09:16Z

To better separate the interface from the algorithm, I suggest renaming the algorithm from AutoRound to SignRound, and updating AutoRoundConfig to SignRoundConfig. In addition, the API should accept either a string or an enum, e.g., alg_configs=["SignRound","Hadamard"].

If we use strings, how should the relevant configurations be passed? Should we use the defaults?

Yes, just like scheme, which supports str, dict, and dataclass. In this case, supporting str helps simplify usage, while advanced users can use a class-based configuration as shown in your example.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…uo/new_ar_arch

auto_round/algorithms/quantization/sign_round/config.py

wenhuach21 · 2026-04-01T07:22:34Z

auto_round/algorithms/quantization/sign_round/config.py

        *,
        iters: int = 200,
        lr: float = None,
        minmax_lr: float = None,
        lr_scheduler=None,
-        seqlen: int = 2048,
-        nsamples: int = 128,


One todo

If different algorithms require different nsamples or seqlen, and the pipeline runs them sequentially for each block, the calibration process should use the maximum required values and truncate the data accordingly for each algorithm.

auto_round/algorithms/quantization/sign_round/quantizer.py

wenhuach21 · 2026-04-01T08:25:37Z

auto_round/algorithms/transforms/hadamard/apply.py

-from auto_round.algorithms.rotation.hadamard.config import HadamardConfig, normalize_hadamard_config
-from auto_round.algorithms.rotation.hadamard.transforms import build_hadamard_transform
+from auto_round.algorithms.transforms.base import BaseRotation
+from auto_round.algorithms.transforms.hadamard.config import HadamardConfig, normalize_hadamard_config


I’d prefer using rotation, but it’s up to you.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

lvliang-intel · 2026-04-03T02:28:45Z

test/__init__.pyc

why .pyc file uploaded here?

Signed-off-by: n1ck-guo <heng.guo@intel.com>

lvliang-intel · 2026-04-03T08:44:01Z

auto_round/context/compress.py

+        self.immediate_packing = is_immediate_packing
+        self.is_immediate_packing = is_immediate_packing
+        self.is_immediate_saving = is_immediate_saving
+        self.formats = formats


two times init self.formats

lvliang-intel · 2026-04-03T08:49:17Z

auto_round/compressors_new/base.py

+            return scheme.dataset
+        return "NeelNanda/pile-10k"
+
+    def post_init(self) -> None:


better to extract the five phases into standalone pipeline methods with explicit inputs/outputs. Each phase should become its own method with clearly documented preconditions, postconditions, and return values, so that post_init() reduces to a thin sequential orchestrator calling them in order.

lvliang-intel · 2026-04-03T08:51:27Z

auto_round/compressors_new/base.py

+        """Generate per-layer config via AutoScheme delta-loss selection."""
+        if self.model_context.is_mllm:
+            logger.info("AutoScheme is not yet supported for multimodal LLMs.")
+            sys.exit(-1)


Better to raise runtime error instead of exit().

lvliang-intel · 2026-04-03T08:54:46Z

auto_round/compressors_new/entry.py

+
+                        pass
+
+                    return DiffusionZeroShotCompressor(alg_configs, **local_args, **kwargs)


Better to move the 8 dynamically-created empty Mixin classes to module-level definitions, or replace them with a registry-based dispatch.

init

7698b93

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo requested review from Copilot, lkk12014402, lvliang-intel, wenhuach21 and xin3he March 13, 2026 02:08

n1ck-guo added the draft label Mar 13, 2026

Copilot started reviewing on behalf of n1ck-guo March 13, 2026 02:09 View session

n1ck-guo added the engineering label Mar 13, 2026

Copilot AI reviewed Mar 13, 2026

View reviewed changes

wenhuach21 reviewed Mar 13, 2026

View reviewed changes

auto_round/compressors_new/base.py Outdated Show resolved Hide resolved

n1ck-guo requested review from WeiweiZhang1 and yiliu30 and removed request for xin3he March 13, 2026 05:31

n1ck-guo added 3 commits March 13, 2026 14:00

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

75b4141

…uo/new_ar_arch

update

ca17097

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

a092e37

…uo/new_ar_arch

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

auto_round/compressors_new/base.py Outdated Show resolved Hide resolved

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

auto_round/algorithms/quantization/auto_round/quantize.py Outdated Show resolved Hide resolved

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

auto_round/algorithms/alg_config.py Show resolved Hide resolved

lvliang-intel reviewed Mar 16, 2026

View reviewed changes

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

cec4ce4

…uo/new_ar_arch

chensuyue added this to the 0.12.0 milestone Mar 16, 2026

This was referenced Mar 17, 2026

decouple quanitzers #787

Open

Refactor collection for v0.13.0 release #1134

Open

n1ck-guo and others added 3 commits March 17, 2026 17:02

update

e265b8f

Signed-off-by: n1ck-guo <heng.guo@intel.com>

merge main

868a82d

Signed-off-by: n1ck-guo <heng.guo@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9dc930c

for more information, see https://pre-commit.ci

lkk12014402 reviewed Mar 31, 2026

View reviewed changes

wenhuach21 reviewed Mar 31, 2026

View reviewed changes

auto_round/algorithms/quantization/auto_round/config.py Outdated Show resolved Hide resolved

n1ck-guo added 2 commits April 1, 2026 15:14

update by comment

31b2d2b

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into hengg…

4490a17

…uo/new_ar_arch

wenhuach21 reviewed Apr 1, 2026

View reviewed changes

auto_round/algorithms/quantization/sign_round/config.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Apr 1, 2026

View reviewed changes

wenhuach21 requested changes Apr 1, 2026

View reviewed changes

wenhuach21 self-requested a review April 1, 2026 08:21

wenhuach21 reviewed Apr 1, 2026

View reviewed changes

auto_round/algorithms/quantization/sign_round/quantizer.py Show resolved Hide resolved

wenhuach21 reviewed Apr 1, 2026

View reviewed changes

n1ck-guo added 7 commits April 2, 2026 09:47

update

fdc92c2

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix

fb04613

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix by comment

4588279

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix output_dir

a313c26

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix

19f95ed

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix

29d2b64

Signed-off-by: n1ck-guo <heng.guo@intel.com>

merge

bfec842

Signed-off-by: n1ck-guo <heng.guo@intel.com>

lvliang-intel reviewed Apr 3, 2026

View reviewed changes

test/__init__.pyc Outdated

Copy link
Copy Markdown

Contributor

lvliang-intel Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why .pyc file uploaded here?

n1ck-guo added 2 commits April 3, 2026 13:54

fix

1c9e529

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix vlm ut

7e7fdeb

Signed-off-by: n1ck-guo <heng.guo@intel.com>

lvliang-intel reviewed Apr 3, 2026

View reviewed changes


		pass

		return DiffusionZeroShotCompressor(alg_configs, local_args, kwargs)

Conversation

n1ck-guo commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lkk12014402 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

wenhuach21 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

wenhuach21 commented Apr 1, 2026

Uh oh!

n1ck-guo commented Apr 1, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

n1ck-guo commented Apr 1, 2026

Uh oh!

wenhuach21 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

n1ck-guo commented Mar 13, 2026 •

edited

Loading

lkk12014402 commented Mar 31, 2026 •

edited

Loading

wenhuach21 commented Mar 31, 2026 •

edited

Loading

wenhuach21 commented Apr 1, 2026 •

edited

Loading