Skip to content

AttributeError in _get_actual_bias when using FP8 quantized T5 with phase-level CPU offload #781

@vbhakta8

Description

@vbhakta8

Environment

  • LightX2V version: latest main branch
  • Python: 3.11
  • PyTorch: 2.x
  • GPU: NVIDIA CUDA

Bug Description

When running WAN 2.2 I2V with FP8 quantized T5 encoder and CPU offload at phase granularity, the _get_actual_bias() method in mm_weight.py raises an AttributeError because it accesses self.bias directly without checking if the attribute exists.

Configuration

{
  "t5_cpu_offload": true,
  "t5_offload_granularity": "phase",
  "t5_quantized": true,
  "t5_quant_scheme": "fp8-q8f"
}

Error Traceback

File "lightx2v/models/input_encoders/hf/wan/t5/model.py", line 524, in forword_attn_with_offload
    q = attn_phase.attn_q.apply(x.squeeze(0)).view(b, -1, n, c)
File "lightx2v/common/ops/mm/mm_weight.py", line 1319, in apply
    self._get_actual_bias(),
File "lightx2v/common/ops/mm/mm_weight.py", line 152, in _get_actual_bias
    if self.bias is None:
       ^^^^^^^^^
AttributeError: 'MMWeightWfp8channelAfp8channeldynamicQ8F' object has no attribute 'bias'

Root Cause

In MMWeightTemplate._get_actual_bias() (line 152), the code directly accesses self.bias without first checking if the attribute exists:

def _get_actual_bias(self, bias=None):
    if bias is not None:
        ...
    else:
        if self.bias is None:  # <-- AttributeError if self.bias doesn't exist!
            return None

When using create_cpu_buffer=True with phase-level offload, the load_quantized() method only initializes self.bias = None if bias is in base_attrs. However, _update_base_attrs() only adds bias to base_attrs if bias_name is not None. For attention layers without a bias term (like T5 attention Q/K/V projections), the bias attribute is never created.

Proposed Fix

Change line 152 in lightx2v/common/ops/mm/mm_weight.py:

# Before
if self.bias is None:

# After  
if not hasattr(self, "bias") or self.bias is None:

Workarounds

  • Use t5_offload_granularity: "block" instead of "phase"
  • Use a different T5 quant scheme (e.g., fp8-vllm or int8-vllm)
  • Disable T5 quantization: t5_quantized: false

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions