Causal attention for factorization blocks?

Thank you for your excellent work. I noticed that in the earlier version of the code, the `AttnProjection` module used causal attention (which was also explicitly described in the early version of the arxiv paper: [https://arxiv.org/pdf/2502.20321v1](https://arxiv.org/pdf/2502.20321v1)).

> To ensure compatibility with autoregressive generation, the factorization blocks are configured with causal attention.

However, it seems that in the current codebase it has been changed to standard bidirectional attention:
[https://github.com/FoundationVision/UniTok/blob/bb8012e8e8d7be66928bedd80506c56a55ea6305/models/vqvae.py#L41](https://github.com/FoundationVision/UniTok/blob/bb8012e8e8d7be66928bedd80506c56a55ea6305/models/vqvae.py#L41)

May I ask what motivated this change, and whether there were any specific considerations behind it?

Or more directly: what differences would these two choices bring in practice? Specifically in terms of reconstruction, semantic accuracy, understanding capability, and autoregressive generation performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Causal attention for factorization blocks? #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Causal attention for factorization blocks? #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions