[Docs]: docs(vllm, sglang): update P/D disaggregation examples Docs (#1811)

googs1025 · web-flow · commit b9ade5a00815 · 2025-12-01T13:37:49.000-08:00
Signed-off-by: CYJiang &lt;googs1025@gmail.com&gt;
diff --git a/samples/disaggregation/sglang/README.md b/samples/disaggregation/sglang/README.md
@@ -2,24 +2,58 @@
 
 Note: 
 - The examples in this directory are for demonstration purposes only. Feel free to use your own model path, rdma network, image etc instead.
-- Routing solution in the examples uses SGlang upstream router `aibrix/sglang-router:v0.1.6` images. AIBrix Envoy Routing examples will be updated shortly. 
+- - Two routing strategies are supported:
+  - Routing solution in the examples uses SGlang upstream router `aibrix/sglang-router:v0.1.6` images.
+  - Dynamic routing via AIBrix Envoy Gateway with `routing-strategy: pd`.
 - SGLang images are built with nixl, if you want to use mooncake-transfer-engine, please change configuration to `--disaggregation-transfer-backend=mooncake`. Note, mooncake requires RDMA to work.
 - AIBrix storm service support replica mode and pool mode, please refer to [AIBrix storm service]( mode, please refer to [AIBrix storm service](https://aibrix.readthedocs.io/latest/designs/aibrix-stormservice.html) for more details.
 - `vke.volcengine.com/rdma`, `k8s.volcengine.com/pod-networks` and `NCCL_IB_GID_INDEX` are specific to Volcano Engine Cloud. Feel free to customize it for your own cluster.
 
 
 ## Configuration
 
-### Build SGLang images with Nixl
+### Build SGLang images with Nixl Support
 
 ```Dockerfile
-FROM lmsysorg/sglang:v0.4.9.post3-cu126
-
-RUN pip install nixl
+# Build arguments for flexible base image and nixl version.
+# These can be overridden at build time with --build-arg.
+
+# Default SGLang base image tag (matches tags from official repos)
+ARG SGLANG_BASE_TAG="v0.4.9.post3-cu126"
+
+# Optional nixl version. Leave empty to install the latest available version.
+ARG NIXL_VER=""
+
+# Base image from the official SGLang project.
+# Available tags can be found at:
+#   https://hub.docker.com/r/lmsysorg/sglang/tags
+# This image is also mirrored/extended under:
+#   https://hub.docker.com/r/aibrix/sglang/tags
+FROM lmsysorg/sglang:${SGLANG_BASE_TAG}
+
+# Install nixl package.
+# - If NIXL_VER is provided (non-empty), install the exact version: nixl==<version>
+# - If NIXL_VER is empty or unset, install the latest version: nixl
+RUN if [ -n "${NIXL_VER}" ]; then \
+        echo "Installing nixl==${NIXL_VER}..."; \
+        pip install "nixl==${NIXL_VER}"; \
+    else \
+        echo "Installing latest nixl..."; \
+        pip install nixl; \
+    fi
 ```
 
 ```bash
-docker build -t aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1 . 
+# 1. Pin both SGLang and nixl versions
+docker build \
+  --build-arg SGLANG_BASE_TAG=v0.4.9.post3-cu126 \
+  --build-arg NIXL_VER=0.4.1 \
+  -t aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1 .
+
+# 1. Use a newer SGLang version with latest nixl
+docker build \
+  --build-arg SGLANG_BASE_TAG=v0.5.0-cu126 \
+  -t aibrix/sglang:v0.5.5.post3-nixl-latest .
 ```
 
 ### Build SGLang Router Images
@@ -82,3 +116,35 @@ curl http://localhost:30000/v1/chat/completions \
     ]
 }'
 ```
+
+### Dynamic P/D Routing with AIBrix Gateway
+
+#### Get Gateway Endpoint
+
+```bash
+LB_IP=$(kubectl get svc -n envoy-gateway-system \
+  envoy-aibrix-system-aibrix-eg-903790dc \
+  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+
+ENDPOINT="${LB_IP}:80"
+echo "Gateway endpoint: http://${ENDPOINT}"
+```
+
+> The service name may vary, use `kubectl get svc -n envoy-gateway-system` to confirm.
+
+#### Send a Disaggregated Request
+
+```bash
+curl -v "http://${ENDPOINT}/v1/chat/completions" \
+  -H "routing-strategy: pd" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "/models/Qwen2.5-7B-Instruct",
+    "messages": [{"role": "user", "content": "Say this is a test!"}],
+    "temperature": 0.7
+  }'
+```
+
+> **Requirements**:
+> - The `model` field must exactly match the label `model.aibrix.ai/name` in your deployment.
+> - The `routing-strategy: pd` header **must be present** to enable P/D splitting.
diff --git a/samples/disaggregation/vllm/README.md b/samples/disaggregation/vllm/README.md
@@ -2,28 +2,64 @@
 
 Note:
 - The examples in this directory are for demonstration purposes only. Feel free to use your own model path, rdma network, image etc instead.
-- Routing solution in the examples uses vllm upstream sample router `disagg_proxy_server.py` images. AIBrix Envoy Routing examples will be updated shortly.
+- Two routing strategies are supported:
+  - Static routing using the upstream vLLM sample router (disagg_proxy_server.py) — suitable for quick testing.
+  - Dynamic routing via AIBrix Envoy Gateway with `routing-strategy: pd`.
 - In order to use this example, you need to build your own vLLM image with Nixl. Guidance is provided.
 - AIBrix storm service support replica mode and pool mode, please refer to [AIBrix storm service]( mode, please refer to [AIBrix storm service](https://aibrix.readthedocs.io/latest/designs/aibrix-stormservice.html) for more details.
 - `vke.volcengine.com/rdma`, `k8s.volcengine.com/pod-networks` and `NCCL_IB_GID_INDEX` are specific to Volcano Engine Cloud. Feel free to customize it for your own cluster.
 
 ## Configuration
 
-### Build vLLM images with Nixl
+### Build vLLM images with Nixl Support
 
 ```Dockerfile
-FROM vllm/vllm-openai:v0.9.2
+# Build arguments to enable flexible base image and nixl version.
+# Override at build time with --build-arg if needed.
 
-RUN pip install nixl==0.4.1 
+# Default vLLM OpenAI-compatible image tag.
+# Available tags: https://hub.docker.com/r/vllm/vllm-openai/tags
+ARG VLLM_BASE_TAG="v0.9.2"
+
+# Optional nixl version. If left empty, the latest version will be installed.
+ARG NIXL_VER=""
+
+# Base image from official vLLM project.
+# Available tags can be found at:
+#   https://hub.docker.com/r/vllm/vllm-openai/tags
+# This image is also mirrored/extended under:
+#   https://hub.docker.com/r/aibrix/vllm-openai/tags
+FROM vllm/vllm-openai:${VLLM_BASE_TAG}
+
+# Install nixl package:
+# - If NIXL_VER is specified, install exact version: nixl==<version>
+# - Otherwise, install the latest available version.
+RUN if [ -n "${NIXL_VER}" ]; then \
+        echo "Installing nixl==${NIXL_VER}..."; \
+        pip install "nixl==${NIXL_VER}"; \
+    else \
+        echo "Installing latest nixl..."; \
+        pip install nixl; \
+    fi
 ```
 
 ```bash
-docker build -t aibrix/vllm-openai:v0.9.2-cu128-nixl-v0.4.1
+# 1. Build with default vLLM tag and pinned nixl version
+docker build \
+  --build-arg NIXL_VER=0.4.1 \
+  -t aibrix/vllm-openai:v0.9.2-nixl-v0.4.1 .
+
+# 2. Build with custom vLLM tag and latest nixl
+docker build \
+  --build-arg VLLM_BASE_TAG=v0.10.2 \
+  -t aibrix/vllm-openai:v0.10.0-nixl-latest .
 ```
 
 > Note: sample router has been included in the image. We do not need additional steps to build it. 
 
-## Start the Router
+## Option 1: Static Routing with disagg_proxy_server.py (Legacy)
+
+### Start the Router
 
 Currently, the router is very simple and it relies on user to pass the prefill and decode IPs.
 Launch the router `kubectl apply -f router.yaml`, ssh to the pod and launch the process.
@@ -64,3 +100,40 @@ curl http://localhost:8000/v1/completions \
         "temperature": 0
     }'
 ```
+---
+
+## Option 2: Dynamic P/D Routing with AIBrix Gateway
+
+AIBrix now supports **native Prefill/Decode disaggregation** through its Envoy Gateway integration. By setting the `routing-strategy: pd` header, requests are automatically split between prefill and decode pods for optimal performance.
+
+
+### Get Gateway Endpoint
+
+```bash
+LB_IP=$(kubectl get svc -n envoy-gateway-system \
+  envoy-aibrix-system-aibrix-eg-903790dc \
+  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+
+ENDPOINT="${LB_IP}:80"
+echo "Gateway endpoint: http://${ENDPOINT}"
+```
+
+> The service name may vary, use `kubectl get svc -n envoy-gateway-system` to confirm.
+
+### Send a Disaggregated Request
+
+```bash
+curl -v "http://${ENDPOINT}/v1/chat/completions" \
+  -H "routing-strategy: pd" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "qwen3-8B",
+    "messages": [{"role": "user", "content": "Say this is a test!"}],
+    "temperature": 0.7
+  }'
+```
+
+> **Requirements**:
+> - The `model` field must exactly match the label `model.aibrix.ai/name` in your deployment.
+> - The `routing-strategy: pd` header **must be present** to enable P/D splitting.
+