Skip to content

Fix position_ids docstring in modeling_flash_attention_utils.py#44547

Open
mvanhorn wants to merge 2 commits intohuggingface:mainfrom
mvanhorn:osc/44373-fix-position-ids-docstring
Open

Fix position_ids docstring in modeling_flash_attention_utils.py#44547
mvanhorn wants to merge 2 commits intohuggingface:mainfrom
mvanhorn:osc/44373-fix-position-ids-docstring

Conversation

@mvanhorn
Copy link

@mvanhorn mvanhorn commented Mar 9, 2026

Fixes #44373

Summary

  • Corrected the docstring for position_ids parameter in prepare_fa_kwargs_from_position_ids and _prepare_from_posids which incorrectly described attention mask semantics ("Boolean or int tensor... 1 means valid and 0 means not valid")
  • The docstring now accurately describes position indices behavior

Testing

  • Docstring-only change, no code behavior affected

This contribution was developed with AI assistance (Claude Code).

The docstring for position_ids incorrectly described attention_mask
semantics ("1 means valid and 0 means not valid"). Updated to
accurately describe position indices.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's update the strings a bit to a more simple one

Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid.
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
`[0, config.n_positions - 1]`. Shape: (batch_size, sequence_length).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Indices of positions of each input sequence tokens. Shape: (batch_size, sequence_length).

I'd rather follow

position_ids (`torch.LongTensor`, *optional*)
Indices of positions of each input sequence tokens.

The generic docstring is a bit outdated imo, position embeddings sounds like the old absolute embeddings + the range can be indefinite since rope can be extended etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same below then

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment, but lgtm!

Comment on lines +368 to +369
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
`[0, config.n_positions - 1]`. Shape: (batch_size, sequence_length).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
`[0, config.n_positions - 1]`. Shape: (batch_size, sequence_length).
Tensor of shape `(batch_size, sequence_length)` containing position indices of each input sequence tokens in the position embeddings.

Drop "position embeddings" reference and range constraint to match
the codebase pattern in generic.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mvanhorn
Copy link
Author

mvanhorn commented Mar 9, 2026

Thanks @vasqu and @stevhliu for the reviews! Simplified both docstrings to drop the "position embeddings" reference and range constraint per @vasqu's suggestion - updated in both prepare_fa_kwargs_from_position_ids and _prepare_from_posids. Pushed in 64de4f4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong docstring for position_ids

3 participants