Skip to content

try to support gemma4#1656

Open
wenhuach21 wants to merge 14 commits intomainfrom
support_gemma4
Open

try to support gemma4#1656
wenhuach21 wants to merge 14 commits intomainfrom
support_gemma4

Conversation

@wenhuach21
Copy link
Copy Markdown
Contributor

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Copilot AI review requested due to automatic review settings April 3, 2026 07:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to add initial support for the gemma4 model family by adapting block input caching/quantization flow to handle variable block shapes and extra cached inputs.

Changes:

  • Allow wrapper blocks to forward positional args through to decoder layers.
  • Add a predefined fixed-attribute lookup for special model types (gemma4).
  • Extend caching/quantization to support variable-shaped block groupings and extra per-block cached inputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
auto_round/wrapper.py Forwards positional args through WrapperMultiblock.forward to improve model compatibility.
auto_round/special_model_handler.py Introduces predefined fixed attributes (e.g., gemma4) retrievable from model.config.model_type.
auto_round/compressors/base.py Uses fixed attributes to alter block caching/quantization for variable block shapes and additional cached inputs.

wenhuach21 and others added 6 commits April 3, 2026 16:08
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@wenhuach21
Copy link
Copy Markdown
Contributor Author

wenhuach21 commented Apr 3, 2026

TODO @n1ck-guo I'll leave it to you

1 consolidate with your pr. While this pr is more general, it costs large vram during calibration.

2 Add an argument to the API to allow users to configure this, since it’s not easy to determine whether a model has variable block inputs. One possible approach is to probe with sample data, but that would require loading all the blocks, which is costly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants