Hi,
Thank you for your excellent work on this project!
I wanted to share that I have implemented vLLM support for Fun-ASR inference, which currently achieves an approximately 50% speed increase. I created a repository for this implementation here:
https://github.com/yuekaizhang/Fun-ASR-vllm
Benchmark Details
Dataset: SPEECHIO_ASR_ZH00007 (approx. 1 hour of audio)
Hardware: Single NVIDIA H20 GPU
| Mode |
Decoding Time |
RTF |
RTFx |
CER |
Note |
| Huggingface PyTorch |
218.2 Secs |
0.06 |
16.5 |
7.02% |
batch_size=1 |
| vLLM (Qwen3-0.6B) |
145.6 Secs |
0.04 |
24.7 |
6.99% |
batch_size=1 |
Any feedback or attention would be greatly appreciated!
Hi,
Thank you for your excellent work on this project!
I wanted to share that I have implemented vLLM support for Fun-ASR inference, which currently achieves an approximately 50% speed increase. I created a repository for this implementation here:
https://github.com/yuekaizhang/Fun-ASR-vllm
Benchmark Details
Dataset: SPEECHIO_ASR_ZH00007 (approx. 1 hour of audio)
Hardware: Single NVIDIA H20 GPU
Any feedback or attention would be greatly appreciated!