fix mask return-type contract regression and add correctness guard for by kaixuanliu · Pull Request #47019 · huggingface/transformers

kaixuanliu · 2026-07-02T08:36:18Z

This PR fixes issue: #46962

fix return-type regression for create_masks_for_generate func
Add strict restriction for _can_skip_causal_mask_xpu: only when kv_offset==0 can we use fast path
@ArthurZucker @Cyrilvallez pls help review. Thx!

xpu Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

vasqu

Some smaller initial comments

vasqu · 2026-07-02T14:13:20Z

+        if isinstance(attention_mask, dict):
+            causal_mask = attention_mask[self.config.layer_types[0]]
+        else:
+            causal_mask = create_causal_mask(
+                config=self.config,
+                inputs_embeds=inputs_embeds,
+                attention_mask=attention_mask,
+                past_key_values=past_key_values,
+                position_ids=position_ids,
+            )


I'd rather we follow the pattern of

transformers/src/transformers/models/gemma2/modeling_gemma2.py

Lines 438 to 452 in 8698b5a

# It may already have been prepared by e.g. `generate`

if not isinstance(causal_mask_mapping := attention_mask, dict):

# Prepare mask arguments

mask_kwargs = {

"config": self.config,

"inputs_embeds": inputs_embeds,

"attention_mask": attention_mask,

"past_key_values": past_key_values,

"position_ids": position_ids,

}

# Create the masks

causal_mask_mapping = {

"full_attention": create_causal_mask(**mask_kwargs),

"sliding_attention": create_sliding_window_causal_mask(**mask_kwargs),

}

Well this looks better. Have updated the code.

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

vasqu · 2026-07-02T15:14:24Z

+                "past_key_values": past_key_values,
+                "position_ids": position_ids,
+            }
+            causal_mask_mapping = {self.config.layer_types[0]: create_causal_mask(**mask_kwargs)}


Last nit can we directly use the layer type that is intended here? E.g. "deepseek_sparse_attention"

vasqu · 2026-07-02T15:33:50Z

Also cc @Cyrilvallez to double check in the end please 🫡

github-actions · 2026-07-03T01:18:35Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: deepseek_v32, glm_moe_dsa

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

github-actions · 2026-07-03T01:52:07Z

CI recap

Dashboard: View test results in Grafana
Latest run: 28632012077:2
Result: success | Jobs: 15 | Tests: 65,281 | Failures: 0 | Duration: 18h 36m

kaixuanliu added 4 commits July 2, 2026 08:30

fix mask return-type contract regression and add correctness guard for

f614c25

xpu Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

update comment

8f3d6dc

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

fix failed CI cases for deepseek_v32

61b8400

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

Merge branch 'main' into mask_fix_xpu

e523167

vasqu reviewed Jul 2, 2026

View reviewed changes

kaixuanliu and others added 3 commits July 2, 2026 22:43

Update src/transformers/masking_utils.py

c1beedc

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

update code

77ce377

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

fix LINT issue

16ec359

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

vasqu reviewed Jul 2, 2026

View reviewed changes

update code

132c6c4

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix mask return-type contract regression and add correctness guard for#47019

fix mask return-type contract regression and add correctness guard for#47019
kaixuanliu wants to merge 8 commits into
huggingface:mainfrom
kaixuanliu:mask_fix_xpu

kaixuanliu commented Jul 2, 2026 •

edited by github-actions Bot

Loading

Uh oh!

vasqu left a comment

Uh oh!

Uh oh!

vasqu Jul 2, 2026

Uh oh!

kaixuanliu Jul 2, 2026

Uh oh!

vasqu Jul 2, 2026

Uh oh!

kaixuanliu Jul 3, 2026

Uh oh!

vasqu commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# It may already have been prepared by e.g. `generate`
	if not isinstance(causal_mask_mapping := attention_mask, dict):
	# Prepare mask arguments
	mask_kwargs = {
	"config": self.config,
	"inputs_embeds": inputs_embeds,
	"attention_mask": attention_mask,
	"past_key_values": past_key_values,
	"position_ids": position_ids,
	}
	# Create the masks
	causal_mask_mapping = {
	"full_attention": create_causal_mask(**mask_kwargs),
	"sliding_attention": create_sliding_window_causal_mask(**mask_kwargs),
	}

Uh oh!

Conversation

kaixuanliu commented Jul 2, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

CI recap

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaixuanliu commented Jul 2, 2026 •

edited by github-actions Bot

Loading