AttributeError in _get_actual_bias when using FP8 quantized T5 with phase-level CPU offload

### Environment
- LightX2V version: latest main branch
- Python: 3.11
- PyTorch: 2.x
- GPU: NVIDIA CUDA

### Bug Description
When running WAN 2.2 I2V with FP8 quantized T5 encoder and CPU offload at phase granularity, the `_get_actual_bias()` method in `mm_weight.py` raises an `AttributeError` because it accesses `self.bias` directly without checking if the attribute exists.

### Configuration
```json
{
  "t5_cpu_offload": true,
  "t5_offload_granularity": "phase",
  "t5_quantized": true,
  "t5_quant_scheme": "fp8-q8f"
}
```

### Error Traceback
```
File "lightx2v/models/input_encoders/hf/wan/t5/model.py", line 524, in forword_attn_with_offload
    q = attn_phase.attn_q.apply(x.squeeze(0)).view(b, -1, n, c)
File "lightx2v/common/ops/mm/mm_weight.py", line 1319, in apply
    self._get_actual_bias(),
File "lightx2v/common/ops/mm/mm_weight.py", line 152, in _get_actual_bias
    if self.bias is None:
       ^^^^^^^^^
AttributeError: 'MMWeightWfp8channelAfp8channeldynamicQ8F' object has no attribute 'bias'
```

### Root Cause
In `MMWeightTemplate._get_actual_bias()` (line 152), the code directly accesses `self.bias` without first checking if the attribute exists:

```python
def _get_actual_bias(self, bias=None):
    if bias is not None:
        ...
    else:
        if self.bias is None:  # <-- AttributeError if self.bias doesn't exist!
            return None
```

When using `create_cpu_buffer=True` with phase-level offload, the `load_quantized()` method only initializes `self.bias = None` if `bias` is in `base_attrs`. However, `_update_base_attrs()` only adds bias to `base_attrs` if `bias_name is not None`. For attention layers without a bias term (like T5 attention Q/K/V projections), the `bias` attribute is never created.

### Proposed Fix
Change line 152 in `lightx2v/common/ops/mm/mm_weight.py`:

```python
# Before
if self.bias is None:

# After  
if not hasattr(self, "bias") or self.bias is None:
```

### Workarounds
- Use `t5_offload_granularity: "block"` instead of `"phase"`
- Use a different T5 quant scheme (e.g., `fp8-vllm` or `int8-vllm`)
- Disable T5 quantization: `t5_quantized: false`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError in _get_actual_bias when using FP8 quantized T5 with phase-level CPU offload #781

Environment

Bug Description

Configuration

Error Traceback

Root Cause

Proposed Fix

Workarounds

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AttributeError in _get_actual_bias when using FP8 quantized T5 with phase-level CPU offload #781

Description

Environment

Bug Description

Configuration

Error Traceback

Root Cause

Proposed Fix

Workarounds

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions