Paper: SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
Thank you for your interest in our SparseServe work! Please star our repository, and stay tuned – we will be releasing the code here soon.
@misc{zhou2025sparseserveunlockingparallelismdynamic,
title={SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving},
author={Qihui Zhou and Peiqi Yin and Pengfei Zuo and James Cheng},
year={2025},
eprint={2509.24626},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2509.24626},
}