Skip to content

Pinned Loading

  1. vllm vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 85.2k 18.8k

  2. vllm-omni vllm-omni Public

    A framework for efficient model inference with omni-modality models

    Python 5.4k 1.2k

  3. recipes recipes Public

    Common recipes to run vLLM

    JavaScript 896 320

  4. llm-compressor llm-compressor Public

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Python 3.5k 561

  5. speculators speculators Public

    A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

    Python 563 126

  6. semantic-router semantic-router Public

    System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

    Go 4.7k 727

Repositories

Showing 10 of 42 repositories
  • vllm-gaudi Public

    Community maintained hardware plugin for vLLM on Intel Gaudi

    vllm-project/vllm-gaudi’s past year of commit activity
    Python 43 Apache-2.0 139 2 52 Updated Jul 2, 2026
  • speculators Public

    A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

    vllm-project/speculators’s past year of commit activity
    Python 563 Apache-2.0 126 23 (1 issue needs help) 41 Updated Jul 2, 2026
  • vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    vllm-project/vllm’s past year of commit activity
    Python 85,170 Apache-2.0 18,835 1,972 (39 issues need help) 3,539 Updated Jul 2, 2026
  • guidellm Public

    Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

    vllm-project/guidellm’s past year of commit activity
    Python 1,334 Apache-2.0 176 49 20 Updated Jul 2, 2026
  • llm-compressor Public

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    vllm-project/llm-compressor’s past year of commit activity
    Python 3,487 Apache-2.0 561 53 (7 issues need help) 77 Updated Jul 2, 2026
  • agentic-api Public

    Stateful API logic for agentic applications using vLLM

    vllm-project/agentic-api’s past year of commit activity
    Rust 40 Apache-2.0 14 9 7 Updated Jul 2, 2026
  • tpu-inference Public

    TPU inference for vLLM, with unified JAX and PyTorch support.

    vllm-project/tpu-inference’s past year of commit activity
    Python 375 Apache-2.0 237 64 (3 issues need help) 287 Updated Jul 3, 2026
  • compressed-tensors Public

    A safetensors extension to efficiently store sparse quantized tensors on disk

    vllm-project/compressed-tensors’s past year of commit activity
    Python 294 Apache-2.0 100 9 (3 issues need help) 32 Updated Jul 3, 2026
  • vllm-ascend Public

    Community maintained hardware plugin for vLLM on Ascend

    vllm-project/vllm-ascend’s past year of commit activity
    C++ 2,341 Apache-2.0 1,519 1,526 (4 issues need help) 727 Updated Jul 3, 2026
  • recipes Public

    Common recipes to run vLLM

    vllm-project/recipes’s past year of commit activity
    JavaScript 896 Apache-2.0 320 32 97 Updated Jul 2, 2026