fix mask return-type contract regression and add correctness guard for#47019
fix mask return-type contract regression and add correctness guard for#47019kaixuanliu wants to merge 8 commits into
Conversation
xpu Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
vasqu
left a comment
There was a problem hiding this comment.
Some smaller initial comments
| if isinstance(attention_mask, dict): | ||
| causal_mask = attention_mask[self.config.layer_types[0]] | ||
| else: | ||
| causal_mask = create_causal_mask( | ||
| config=self.config, | ||
| inputs_embeds=inputs_embeds, | ||
| attention_mask=attention_mask, | ||
| past_key_values=past_key_values, | ||
| position_ids=position_ids, | ||
| ) |
There was a problem hiding this comment.
I'd rather we follow the pattern of
transformers/src/transformers/models/gemma2/modeling_gemma2.py
Lines 438 to 452 in 8698b5a
There was a problem hiding this comment.
Well this looks better. Have updated the code.
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
| "past_key_values": past_key_values, | ||
| "position_ids": position_ids, | ||
| } | ||
| causal_mask_mapping = {self.config.layer_types[0]: create_causal_mask(**mask_kwargs)} |
There was a problem hiding this comment.
Last nit can we directly use the layer type that is intended here? E.g. "deepseek_sparse_attention"
|
Also cc @Cyrilvallez to double check in the end please 🫡 |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: deepseek_v32, glm_moe_dsa |
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
CI recapDashboard: View test results in Grafana |
This PR fixes issue: #46962
create_masks_for_generatefunc_can_skip_causal_mask_xpu: only when kv_offset==0 can we use fast path@ArthurZucker @Cyrilvallez pls help review. Thx!