-
Notifications
You must be signed in to change notification settings - Fork 136
Open
Description
Hi, thanks for your great work.
I noticed in the MixedAttention function that the following code first computes the query (q) and its interactions within the corresponding chunk.
# self attn
_, _, _, _, self_attn_out_sh, self_attn_lse_hs, _, _ = (
_flash_attn_varlen_forward(
q=q,
k=k,
v=v,
cu_seqlens_q=self_attn_cu_seqlen,
cu_seqlens_k=self_attn_cu_seqlen,
max_seqlen_q=max_seqlen,
max_seqlen_k=max_seqlen,
softmax_scale=softmax_scale,
causal=True,
dropout_p=0.0,
)
)However, the max_seqlen is clearly larger than the maximum value in self_attn_cu_seqlen.
Line 96 in b5d5836
| max_seqlen_q=max_seqlen, |
I would like to know if this leads to any potential issues, such as reduced computational efficiency or unintended behavior in the attention computation?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels