Fix position_ids docstring in modeling_flash_attention_utils.py by mvanhorn · Pull Request #44547 · huggingface/transformers

mvanhorn · 2026-03-09T14:59:27Z

Summary

Corrected the docstring for position_ids parameter in prepare_fa_kwargs_from_position_ids and _prepare_from_posids which incorrectly described attention mask semantics ("Boolean or int tensor... 1 means valid and 0 means not valid")
The docstring now accurately describes position indices behavior

Testing

Docstring-only change, no code behavior affected

This contribution was developed with AI assistance (Claude Code).

The docstring for position_ids incorrectly described attention_mask semantics ("1 means valid and 0 means not valid"). Updated to accurately describe position indices. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vasqu

Thanks, let's update the strings a bit to a more simple one

vasqu · 2026-03-09T15:47:32Z

src/transformers/modeling_flash_attention_utils.py

-            Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid.
+            Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
+            `[0, config.n_positions - 1]`. Shape: (batch_size, sequence_length).



Suggested change

Indices of positions of each input sequence tokens. Shape: (batch_size, sequence_length).

I'd rather follow

transformers/src/transformers/utils/generic.py

Lines 756 to 757 in a049c00

position_ids (`torch.LongTensor`, *optional*)

Indices of positions of each input sequence tokens.

The generic docstring is a bit outdated imo, position embeddings sounds like the old absolute embeddings + the range can be indefinite since rope can be extended etc

Same below then

stevhliu

one minor comment, but lgtm!

stevhliu · 2026-03-09T15:52:55Z

src/transformers/modeling_flash_attention_utils.py

+            Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
+            `[0, config.n_positions - 1]`. Shape: (batch_size, sequence_length).


Suggested change

Indices of positions of each input sequence tokens in the position embeddings. Selected in the range

`[0, config.n_positions - 1]`. Shape: (batch_size, sequence_length).

Tensor of shape `(batch_size, sequence_length)` containing position indices of each input sequence tokens in the position embeddings.

Drop "position embeddings" reference and range constraint to match the codebase pattern in generic.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mvanhorn · 2026-03-09T16:08:29Z

Thanks @vasqu and @stevhliu for the reviews! Simplified both docstrings to drop the "position embeddings" reference and range constraint per @vasqu's suggestion - updated in both prepare_fa_kwargs_from_position_ids and _prepare_from_posids. Pushed in 64de4f4.

mvanhorn mentioned this pull request Mar 9, 2026

Wrong docstring for position_ids #44373

Open

4 tasks

vasqu reviewed Mar 9, 2026

View reviewed changes

stevhliu approved these changes Mar 9, 2026

View reviewed changes

Simplify position_ids docstring per review feedback

64de4f4

Drop "position embeddings" reference and range constraint to match the codebase pattern in generic.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix position_ids docstring in modeling_flash_attention_utils.py#44547

Fix position_ids docstring in modeling_flash_attention_utils.py#44547
mvanhorn wants to merge 2 commits intohuggingface:mainfrom
mvanhorn:osc/44373-fix-position-ids-docstring

mvanhorn commented Mar 9, 2026

Uh oh!

vasqu left a comment

Uh oh!

vasqu Mar 9, 2026

Uh oh!

vasqu Mar 9, 2026

Uh oh!

stevhliu left a comment

Uh oh!

stevhliu Mar 9, 2026

Uh oh!

mvanhorn commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


	Indices of positions of each input sequence tokens. Shape: (batch_size, sequence_length).

	position_ids (`torch.LongTensor`, optional)
	Indices of positions of each input sequence tokens.

		Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
		`[0, config.n_positions - 1]`. Shape: (batch_size, sequence_length).

	Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
	`[0, config.n_positions - 1]`. Shape: (batch_size, sequence_length).
	Tensor of shape `(batch_size, sequence_length)` containing position indices of each input sequence tokens in the position embeddings.

Conversation

mvanhorn commented Mar 9, 2026

Summary

Testing

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

mvanhorn commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants