Skip to content

[BUG] HQQ quantized model inference fails for the latest version of pruna #448

@ParagEkbote

Description

@ParagEkbote

Describe the bug

A model smashed with Pruna version 0.2.8` fails to load with the latest version of pruna due to some type of upstream change in the hqq quantizer api. WDYT?

cc: @davidberenstein1957

What I did

I smashed the model using Pruna version 0.2.8 When I try to load the model using the latest version, the model fails to load with the following error traceback:

INFO - Could not load HQQ model using pipeline, trying generic HQQ pipeline...

WARNING - Setting factorizer deprecated. Please use config.add(dict(factorizer=True)).
Traceback (most recent call last):
  File "/app/app.py", line 52, in <module>
    model = PrunaModel.from_pretrained(model_repo)
  File "/usr/local/lib/python3.10/site-packages/pruna/telemetry/metrics.py", line 218, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pruna/engine/pruna_model.py", line 364, in from_pretrained
    model, smash_config = load_pruna_model_from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/pruna/engine/load.py", line 184, in load_pruna_model_from_pretrained
    return load_pruna_model(model_path=path, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pruna/engine/load.py", line 79, in load_pruna_model
    model = resmash_fn(model, smash_config)
  File "/usr/local/lib/python3.10/site-packages/pruna/engine/load.py", line 207, in resmash
    smash_config_subset[algorithm_name] = True
  File "/usr/local/lib/python3.10/site-packages/pruna/config/smash_config.py", line 569, in __setitem__
    self.add({name: value})
  File "/usr/local/lib/python3.10/site-packages/pruna/config/smash_config.py", line 612, in add
    self._configuration[key] = value
  File "/usr/local/lib/python3.10/site-packages/ConfigSpace/configuration.py", line 187, in __setitem__
    param = self.config_space[key]
  File "/usr/local/lib/python3.10/site-packages/ConfigSpace/configuration_space.py", line 874, in __getitem__
    return self._dag.nodes[key].hp
KeyError: 'factorizer'
from pruna import PrunaModel

repo = "AINovice2005/SmolLM2-360M-smashed"

tokenizer =  PrunaModel.from_pretrained("HuggingFaceTB/SmolLM2-360M-smashed")

model = PrunaModel.from_pretrained(repo)

Expected behavior

The model should load normally.

Environment

  • pruna version: 0.2.8
  • python version: 3.11
  • Operating System: 5.15.0-1084-aws-x86_64-with-glibc2.31

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions