[CUDA] GPT-OSS-20B Throughput Optimization

This is used to track the progress of GPT-OSS-20B Throughput Optimization.

Related PRs:
* olive-recipes
https://github.com/microsoft/olive-recipes/pull/507

Experiments of recipes: https://github.com/tianleiwu/olive-recipes/blob/tlwu/gpt-oss-20b/gpt-oss-20b/gpt_oss_20b_experiments.md
 
* onnxruntime-genai
https://github.com/microsoft/onnxruntime-genai/pull/2234

* cuda kernel improvements:
https://github.com/microsoft/onnxruntime/pull/29038
https://github.com/microsoft/onnxruntime/pull/29161
https://github.com/microsoft/onnxruntime/pull/29162
https://github.com/microsoft/onnxruntime/pull/29166
https://github.com/microsoft/onnxruntime/pull/29177
https://github.com/microsoft/onnxruntime/pull/29167 (Experiment)

* fusion
https://github.com/microsoft/onnxruntime/pull/29186
https://github.com/microsoft/onnxruntime/pull/29170
QMoE router Fusion (Experiment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] GPT-OSS-20B Throughput Optimization #29160

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[CUDA] GPT-OSS-20B Throughput Optimization #29160

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions