Skip to content

Conversation

@vanbasten23
Copy link
Collaborator

Description

This PR adds a SP e2e test and adds it to the CI.

Tests

pytest -s -vv tests/e2e/test_sequence_parallelism.py

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
…allelism.py

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
@vanbasten23 vanbasten23 marked this pull request as ready for review December 2, 2025 01:08
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

  • why is this change being made,
  • the problem being solved and any relevant context,
  • why this is a good solution,
  • some information about the specific implementation,
  • shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

token_num = x.shape[0]
# NOTE(chengjiyao): make sure the sharded token_num is larger than TPU_SECOND_LAST_MINOR
if token_num // self.mesh.shape["model"] >= TPU_SECOND_LAST_MINOR:
logger.info(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our e2e test, should we check the log to see sp is actually enabled? Another way is to check the final optimized graph, but that's more difficult.

Copy link
Collaborator Author

@vanbasten23 vanbasten23 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked manually that this line(76) is executed. But I think you meant if we can check it in the test. I'm not sure how we can do that in the test. Let me know if you have some ideas.

Though, I added a long prompt (line24 ""Three Rings...") to ensure token_num//8 >= 8 is triggered. Also the precompilation phase use very large num_tokens so this case is triggered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we output the log in a file and checked the file's content later?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be logger.debug instead of logger.info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No luck for now. Somehow I couldn't capture the logger. I agree that this may be the easiest way to test. But if someone write a similar logging string, then this test may not work as intended. Other parallelism seem to have the same issue: it's hard to examine how each layer is sharded in the test.

I've been thinking if there is better way to test. Since SP's main benefits is to reduce memory, we can check with SP if the mem usage is indeed reduced. But there is no jax api that let me check the mem usage.

How about let's merge this pr so that it verifies with "enable_sequence_parallelism=True" and "tensor_parallelism=8" the test runs to completion, since that is the intended way to enable SP. Then when we do integration, we improve the test. Wdyt?

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants