Skip to content

Conversation

@laneeeee
Copy link

@laneeeee laneeeee commented Dec 4, 2025

No description provided.

: ProcessGroup(device) {
c10::intrusive_ptr<c10d::ProcessGroupNCCL::Options> pg_options =
c10d::ProcessGroupNCCL::Options::create();
#if TORCH_VERSION_MAJOR >= 2 && TORCH_VERSION_MINOR >= 7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're using torch>=2.7, no need to add this line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@laneeeee laneeeee force-pushed the main branch 2 times, most recently from d5510a7 to c62963f Compare December 4, 2025 08:40

namespace xllm::kernel::ilu {

void layer_norm(at::Tensor& input,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add fused_norm ops.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image 我看这里cuda是layer_norm, 是不是cuda版本跑不对的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants