Skip to content

Commit 05e4b16

Browse files
authored
Update support matrices (#1232)
Signed-off-by: Teresa Chen <boe20211@gmail.com>
1 parent 0b4594c commit 05e4b16

File tree

6 files changed

+55
-9
lines changed

6 files changed

+55
-9
lines changed

docs/recommended_models_features.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,21 @@ These tables show the models currently tested for accuracy and performance.
2424
This table shows the features currently tested for accuracy and performance.
2525

2626
{{ read_csv('../support_matrices/feature_support_matrix.csv', keep_default_na=False) }}
27+
28+
## Kernel Support
29+
30+
This table shows the current kernel support status.
31+
32+
{{ read_csv('../support_matrices/kernel_support_matrix.csv', keep_default_na=False) }}
33+
34+
## Parallelism Support
35+
36+
This table shows the current parallelism support status.
37+
38+
{{ read_csv('../support_matrices/parallelism_support_matrix.csv', keep_default_na=False) }}
39+
40+
## Quantization Support
41+
42+
This table shows the current quantization support status.
43+
44+
{{ read_csv('../support_matrices/quantization_support_matrix.csv', keep_default_na=False) }}
Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
11
Feature,CorrectnessTest,PerformanceTest
2-
"Collective Communication Matmul",✅,N/A
3-
"Prefix Caching",✅,✅
4-
"Multimodal Inputs",✅,✅
5-
"Quantized Matmul Attention and KV Cache",✅,✅
62
"Chunked Prefill",✅,✅
7-
"JAX-Path Qxix Quantization",✅,✅
3+
"DCN-based P/D disaggregation",to be added,to be added
4+
"KV cache host offloading",to be added,to be added
5+
"Llama 4 Maverick",to be added,to be added
6+
"LoRA_Torch",✅,to be added
7+
"Multimodal Inputs",✅,✅
8+
"Out-of-tree model support",✅,✅
9+
"Prefix Caching",✅,✅
810
"Single Program Multi Data",✅,✅
11+
"Speculative Decoding: Eagle3",✅,✅
912
"Speculative Decoding: Ngram",✅,✅
10-
"Structured Decoding",✅,N/A
11-
"Ragged Paged Attention V3",✅,✅
13+
"async scheduler",✅,✅
14+
"runai_model_streamer_loader",✅,N/A
15+
"sampling_params",✅,N/A
16+
"structured_decoding",✅,N/A
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Feature,CorrectnessTest,PerformanceTest
2+
"Collective Communication Matmul",✅,to be added
3+
"MLA",to be added,to be added
4+
"MoE",to be added,to be added
5+
"Quantized Attention",to be added,to be added
6+
"Quantized KV Cache",to be added,to be added
7+
"Quantized Matmul",to be added,to be added
8+
"Ragged Paged Attention V3",✅,✅
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Feature,CorrectnessTest,PerformanceTest
2+
"CP",to be added,to be added
3+
"DP",❌,N/A
4+
"EP",to be added,to be added
5+
"PP",✅,✅
6+
"SP",to be added,to be added
7+
"TP",to be added,to be added
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Feature,Recommended TPU Generations,CorrectnessTest,PerformanceTest
2+
"AWQ INT4","v5, v6",to be added,to be added
3+
"FP4 W4A16",v7,to be added,to be added
4+
"FP8 W8A8",v7,to be added,to be added
5+
"FP8 W8A16",v7,to be added,to be added
6+
"INT4 W4A16","v5, v6",to be added,to be added
7+
"INT8 W8A8","v5, v6",to be added,to be added

support_matrices/text_only_model_support_matrix.csv

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
Model,UnitTest,IntegrationTest,Benchmark
22
"meta-llama/Llama-3.3-70B-Instruct",✅,✅,✅
3-
"Qwen/Qwen3-32B",✅,✅,✅
3+
"Qwen/Qwen3-4B",✅,✅,✅
44
"google/gemma-3-27b-it",✅,✅,✅
5+
"Qwen/Qwen3-32B",✅,✅,✅
6+
"meta-llama/Llama-Guard-4-12B",✅,✅,✅
57
"meta-llama/Llama-3.1-8B-Instruct",✅,✅,✅
68
"Qwen/Qwen3-30B-A3B",✅,✅,✅
7-
"Qwen/Qwen3-4B",✅,✅,✅

0 commit comments

Comments
 (0)