Skip to content

Transformer specification and auto-generation method for the existing models #164

@aftersnow

Description

@aftersnow

Feature request:

Transformer is the dominant architecture for modern LLMs, and its design has largely converged. For example, most state-of-the-art open-source models adopt GQA/MLA for the Attention layer and MoE for the MLP layer. As a result, a Transformer can be viewed as a composition of standardized building blocks. This enables further abstraction of a unified architectural specification across different open-source models, which can serve as the Transformer specification in ModelPack. Based on this specification, many valuable capabilities become possible.

Expected Outcome:

  • Jointly complete a unified Transformer specification (an in-progress PR already exists)
  • Using vLLM and SGLang, conduct POCs on three or more mainstream open-source Transformer models based on this specification
  • Design a workflow or Claude skills that can automatically generate Transformer specification definitions from models in the Hugging Face transformers repository

Use case:

For inference engines, it enables automatic support for multiple Transformer models, so newly trained Transformer models can be supported without per-model adaptation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions