flashcomm3 #3

wyu0-0 · 2025-08-20T03:59:16Z

设计目标

通信-计算重叠：将MoE计算链拆分为2组独立子流，实现Dispatch/Combine（通信流）与MatMul/SwiGLU（计算流）的硬件级并行。
专家级负载均衡：专家分组需满足计算量均衡，避免长尾效应。
零语义变更：保证输出与单流模式数学等价。

技术架构

典型MoE模型中，dispatch和combine算子位于Moe层，算子调度顺序如下：
[token] → [dispatch] → [gmm] → [swiglu] → [gmm] → [combine]
其中dispatch和combine 属于通信过程，[gmm] → [swiglu] → [gmm]属于专家计算过程。
当前的单流流程如下：

graph LR

subgraph 主流

C --> E[专家计算]
E --> G[Combine通信]
end

A[输入] --> B[专家路由]
B --> C[Dispatch通信]
G --> I[结果聚合]
I --> J[输出]

昇腾芯片支持通过张量/向量计算单元、通信及访存资源的并行使用，最大化算力与带宽利用率。其任务以流为单位串行提交至硬件引擎，当任务无依赖且资源无冲突时，多流机制可并行执行任务，提升资源利用率。这里我们将专家切分为两组，GroupA和GroupB，通过二者的通信-计算并行，协同提升通信带宽与算力利用率，双流并行流程如下：

graph LR

    subgraph 主流
    B1 --> C1[专家计算]
    C1 --> E1[Combine通信]
    end



    subgraph 辅流
    B2 --> C2[专家计算]
    C2 --> E2[Combine通信]
    end

	A[输入] --> A1{专家均衡切分}
    A1--> |Group A| B1[Dispatch通信]
    A1 --> |Group B| B2[Dispatch通信]
    E1 --> F[结果聚合]
    E2 --> F
    F --> J[输出]

当前状态：挂起

DONE：已实现基于原始dispatch和combine的双流并行
TODO：因为dispatch和combine在通信流程中使用了AICore，导致这两种融合算子并非单一的通信算子，无法与我们之前定义的计算流程[gmm] → [swiglu] → [gmm]真正并行。为实现通信-计算双流并行，需要对dispatch和combine融合算子拆分通信部分和计算部分，在计算部分结束后，即可释放对计算资源的占用，使得计算资源可同步用于MoE的计算流程。待此部分算子提供后继续适配，算子需求已提交。待此部分算子适配后进行性能测试。

wyu0-0 added 6 commits July 22, 2025 19:13

refactor fused_experts_with_mc2

25c6d76

single ctx final

e846a84

dual stream

d844918

split expert

0757d65

debug

2be0e87

dual batch

9f8a730

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flashcomm3 #3

flashcomm3 #3

Uh oh!

wyu0-0 commented Aug 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flashcomm3 #3

Are you sure you want to change the base?

flashcomm3 #3

Uh oh!

Conversation

wyu0-0 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

设计目标

技术架构

当前状态：挂起

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wyu0-0 commented Aug 20, 2025 •

edited

Loading