feat: add distributer algorithm by minettekaum · Pull Request #459 · PrunaAI/pruna

minettekaum · 2025-12-12T17:37:05Z

Description

Adding ring_attn algorithm

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

I ran the tests

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

cursor

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2025-12-12T17:46:02Z

src/pruna/algorithms/ring_attn/ring.py

+            seed_t = sync_tensor(seed_t, dim=0, group=None)
+            seed_t = seed_t.chunk(world_size, dim=0)[0]
+            seed = seed_t.item()
+            seed -= torch.iinfo(torch.int64).min


Bug: Incorrect seed calculation produces excessively large values

The seed calculation subtracts torch.iinfo(torch.int64).min (which equals -2^63) from the seed, effectively adding 2^63. Since torch.randint already produces non-negative values in [0, 2^63-1), this subtraction results in seed values in [2^63, 2^64-1), which are extremely large. This appears unintentional - the seed is already suitable for manual_seed() without this transformation. The unnecessary arithmetic could cause overflow issues or unexpected behavior with the random number generator.

cursor · 2025-12-12T17:46:02Z

src/pruna/algorithms/ring_attn/utils/ring_utils.py

+        torch.Tensor
+            The gradient of the output tensor.
+        """
+        return ring_attention._scaled_dot_product_ring_flash_attention_backward(*args, **kwargs)


Bug: Incomplete backward pass missing saved tensors for gradient computation

The LocalFunc autograd function's backward method is incomplete. The forward method doesn't call ctx.save_for_backward() to save the tensors needed for gradient computation (mesh, query, key, value, output, lse). The backward method only receives gradient outputs via *args and passes them directly to _scaled_dot_product_ring_flash_attention_backward, but this function typically requires the original inputs and outputs to compute input gradients. This would cause training (backward pass) to fail with incorrect arguments or missing data.

johannaSommer

LGTM! There is one hook missing in smash.py that checks for this ring attention algorithm and spawns the distribtued server, otherwise no notes 🌻

github-actions · 2026-01-04T00:09:16Z

This PR has been inactive for 10 days and is now marked as stale.

github-actions · 2026-01-26T00:09:05Z

This PR has been inactive for 10 days and is now marked as stale.

…#224

minettekaum · 2026-02-06T12:38:11Z

@johannaSommer could you take a quick look? I've made the changes that you pointed out ☺️

Co-authored-by: Johanna Sommer <johanna@mail-sommer.com>

johannaSommer · 2026-02-10T09:57:50Z

src/pruna/smash.py

        # perform any necessary setup steps before the smashing process begins
        execute_algorithm_pre_smash_hooks(model, smash_config, algorithm_order)

+        # ring_attn needs a process group; if we're not already under torchrun/torch.distributed,


LGTM! Sorry for being naggy about it but could we move this into a function somwhere in the distributed utils? I feel like the smash function is the entry point for a lot of people trying to understand the code so I would love to keep it as lean as possible

cursor bot reviewed Dec 12, 2025

View reviewed changes

minettekaum requested a review from johannaSommer December 16, 2025 15:16

johannaSommer requested changes Dec 18, 2025

View reviewed changes

minettekaum changed the title ~~Feat/distributer-algorithm: adding ring_attn~~ Feat: distributer-algorithm Dec 19, 2025

minettekaum changed the title ~~Feat: distributer-algorithm~~ Feat: add distributer algorithm Dec 19, 2025

minettekaum changed the title ~~Feat: add distributer algorithm~~ feat: add distributer algorithm Dec 19, 2025

github-actions bot added stale and removed stale labels Jan 4, 2026

minettekaum force-pushed the feat/distributer-algorithm branch from c15480e to 38ced18 Compare January 14, 2026 16:05

minettekaum requested a review from sharpenb January 15, 2026 10:32

github-actions bot added the stale label Jan 26, 2026

github-actions bot closed this Feb 2, 2026

minettekaum reopened this Feb 6, 2026

Ubuntu added 4 commits February 6, 2026 12:16

ring_attn added

4c150db

fixed typo

1dee42f

adding hook and tweaking tests

835a894

add support for diffusers 0.35 and evaluation agent for ring attention …

1ea806e

…#224

minettekaum force-pushed the feat/distributer-algorithm branch from 38ced18 to 1ea806e Compare February 6, 2026 12:17

Co-author

a1f7dac

Co-authored-by: Johanna Sommer <johanna@mail-sommer.com>

github-actions bot removed the stale label Feb 7, 2026

johannaSommer requested changes Feb 10, 2026

View reviewed changes

distributed setup logic moved from smash.py to server_utils.py

5c6bc9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add distributer algorithm#459

feat: add distributer algorithm#459
minettekaum wants to merge 6 commits intomainfrom
feat/distributer-algorithm

minettekaum commented Dec 12, 2025 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Dec 12, 2025

Uh oh!

cursor bot Dec 12, 2025

Uh oh!

johannaSommer left a comment

Uh oh!

github-actions bot commented Jan 4, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

minettekaum commented Feb 6, 2026

Uh oh!

johannaSommer Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

minettekaum commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Dec 12, 2025

Choose a reason for hiding this comment

Bug: Incorrect seed calculation produces excessively large values

Uh oh!

cursor bot Dec 12, 2025

Choose a reason for hiding this comment

Bug: Incomplete backward pass missing saved tensors for gradient computation

Uh oh!

johannaSommer left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 4, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

minettekaum commented Feb 6, 2026

Uh oh!

johannaSommer Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

minettekaum commented Dec 12, 2025 •

edited

Loading