Save size in scalar scratch for bo and bq #1201

rupengliu-meta · 2025-12-01T18:44:39Z

Description

In bo and bq, we could save size in smem to avoid calculation. This will reduce unnecessary computation.
Tests have passed for both kernels

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Ran unit tests and done local e2e testing
Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: rupengliu-meta <rupengliu@meta.com>

yaochengji

Thanks for the contribution! I think the trade-off is between scalar computation and scalar load/store, do you have any performance number after the modification?

rupengliu-meta · 2025-12-01T19:17:42Z

Thanks for the contribution! I think the trade-off is between scalar computation and scalar load/store, do you have any performance number after the modification?

yes, I will update the perf numbers later

rupengliu-meta · 2025-12-02T00:44:17Z

seems only having pretty minimal throughput improvement, but the improvement is consistently around 1%-2%. tested through the kernel benchmarking script (not e2e)

vanbasten23 · 2025-12-02T04:37:36Z

tpu_inference/kernels/ragged_paged_attention/v3/kernel.py

            input_output_aliases={
-                7: 0,
-                9: 1
+                8: 0,


do you know why queries is not in the donate_argnames in jax.jit?

it depends on if it will be used again after the attention. I took a quick look and didn't find where it is used again. So if no reuse of the queries, then we could donate it. @bythew3i thought?

kyuyeunk

Isn't this change also applicable for bkv as well? i.e., save bkv sz to a scalar scratch?

rupeng-liu · 2025-12-03T20:12:49Z

@kyuyeunk yep, good idea. I just checked the bkv sz, the sz is offset + bkv_sz_frm_new, which during wait is False, there is no existing value for this, we need to still do the extra calculation if added in the wait=false. So this might not be applicable for bkv?

kyuyeunk · 2025-12-06T08:19:43Z

Thanks for the contribution! I think the trade-off is between scalar computation and scalar load/store, do you have any performance number after the modification?

yes, I will update the perf numbers later

Ping on updating perf numbers on the pr description.

Save size in scalar for bo and bq

18e55a4

Signed-off-by: rupengliu-meta <rupengliu@meta.com>

rupengliu-meta marked this pull request as ready for review December 1, 2025 18:46

rupengliu-meta requested review from bythew3i, kyuyeunk and yaochengji as code owners December 1, 2025 18:46

rupengliu-meta changed the title ~~Save size in scalar for bo and bq~~ Save size in scalar scratch for bo and bq Dec 1, 2025

Merge branch 'main' into rupliu/k2

c2b91a7

yaochengji reviewed Dec 1, 2025

View reviewed changes

Merge branch 'main' into rupliu/k2

4920fb7

vanbasten23 reviewed Dec 2, 2025

View reviewed changes

kyuyeunk reviewed Dec 2, 2025

View reviewed changes

Merge branch 'main' into rupliu/k2

90f185c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Save size in scalar scratch for bo and bq #1201

Save size in scalar scratch for bo and bq #1201

rupengliu-meta commented Dec 1, 2025 •

edited

Loading

Uh oh!

yaochengji left a comment

Uh oh!

rupengliu-meta commented Dec 1, 2025

Uh oh!

rupengliu-meta commented Dec 2, 2025 •

edited

Loading

Uh oh!

vanbasten23 Dec 2, 2025

Uh oh!

rupeng-liu Dec 3, 2025 •

edited

Loading

Uh oh!

kyuyeunk left a comment

Uh oh!

rupeng-liu commented Dec 3, 2025 •

edited

Loading

Uh oh!

kyuyeunk commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Save size in scalar scratch for bo and bq #1201

Are you sure you want to change the base?

Save size in scalar scratch for bo and bq #1201

Conversation

rupengliu-meta commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

yaochengji left a comment

Choose a reason for hiding this comment

Uh oh!

rupengliu-meta commented Dec 1, 2025

Uh oh!

rupengliu-meta commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vanbasten23 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

rupeng-liu Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyuyeunk left a comment

Choose a reason for hiding this comment

Uh oh!

rupeng-liu commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kyuyeunk commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rupengliu-meta commented Dec 1, 2025 •

edited

Loading

rupengliu-meta commented Dec 2, 2025 •

edited

Loading

rupeng-liu Dec 3, 2025 •

edited

Loading

rupeng-liu commented Dec 3, 2025 •

edited

Loading