Skip to content

Commit b9ade5a

Browse files
authored
[Docs]: docs(vllm, sglang): update P/D disaggregation examples Docs (#1811)
Signed-off-by: CYJiang <googs1025@gmail.com>
1 parent 341ee93 commit b9ade5a

File tree

2 files changed

+151
-12
lines changed

2 files changed

+151
-12
lines changed

samples/disaggregation/sglang/README.md

Lines changed: 72 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,58 @@
22

33
Note:
44
- The examples in this directory are for demonstration purposes only. Feel free to use your own model path, rdma network, image etc instead.
5-
- Routing solution in the examples uses SGlang upstream router `aibrix/sglang-router:v0.1.6` images. AIBrix Envoy Routing examples will be updated shortly.
5+
- - Two routing strategies are supported:
6+
- Routing solution in the examples uses SGlang upstream router `aibrix/sglang-router:v0.1.6` images.
7+
- Dynamic routing via AIBrix Envoy Gateway with `routing-strategy: pd`.
68
- SGLang images are built with nixl, if you want to use mooncake-transfer-engine, please change configuration to `--disaggregation-transfer-backend=mooncake`. Note, mooncake requires RDMA to work.
79
- AIBrix storm service support replica mode and pool mode, please refer to [AIBrix storm service]( mode, please refer to [AIBrix storm service](https://aibrix.readthedocs.io/latest/designs/aibrix-stormservice.html) for more details.
810
- `vke.volcengine.com/rdma`, `k8s.volcengine.com/pod-networks` and `NCCL_IB_GID_INDEX` are specific to Volcano Engine Cloud. Feel free to customize it for your own cluster.
911

1012

1113
## Configuration
1214

13-
### Build SGLang images with Nixl
15+
### Build SGLang images with Nixl Support
1416

1517
```Dockerfile
16-
FROM lmsysorg/sglang:v0.4.9.post3-cu126
17-
18-
RUN pip install nixl
18+
# Build arguments for flexible base image and nixl version.
19+
# These can be overridden at build time with --build-arg.
20+
21+
# Default SGLang base image tag (matches tags from official repos)
22+
ARG SGLANG_BASE_TAG="v0.4.9.post3-cu126"
23+
24+
# Optional nixl version. Leave empty to install the latest available version.
25+
ARG NIXL_VER=""
26+
27+
# Base image from the official SGLang project.
28+
# Available tags can be found at:
29+
# https://hub.docker.com/r/lmsysorg/sglang/tags
30+
# This image is also mirrored/extended under:
31+
# https://hub.docker.com/r/aibrix/sglang/tags
32+
FROM lmsysorg/sglang:${SGLANG_BASE_TAG}
33+
34+
# Install nixl package.
35+
# - If NIXL_VER is provided (non-empty), install the exact version: nixl==<version>
36+
# - If NIXL_VER is empty or unset, install the latest version: nixl
37+
RUN if [ -n "${NIXL_VER}" ]; then \
38+
echo "Installing nixl==${NIXL_VER}..."; \
39+
pip install "nixl==${NIXL_VER}"; \
40+
else \
41+
echo "Installing latest nixl..."; \
42+
pip install nixl; \
43+
fi
1944
```
2045

2146
```bash
22-
docker build -t aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1 .
47+
# 1. Pin both SGLang and nixl versions
48+
docker build \
49+
--build-arg SGLANG_BASE_TAG=v0.4.9.post3-cu126 \
50+
--build-arg NIXL_VER=0.4.1 \
51+
-t aibrix/sglang:v0.4.9.post3-cu126-nixl-v0.4.1 .
52+
53+
# 1. Use a newer SGLang version with latest nixl
54+
docker build \
55+
--build-arg SGLANG_BASE_TAG=v0.5.0-cu126 \
56+
-t aibrix/sglang:v0.5.5.post3-nixl-latest .
2357
```
2458

2559
### Build SGLang Router Images
@@ -82,3 +116,35 @@ curl http://localhost:30000/v1/chat/completions \
82116
]
83117
}'
84118
```
119+
120+
### Dynamic P/D Routing with AIBrix Gateway
121+
122+
#### Get Gateway Endpoint
123+
124+
```bash
125+
LB_IP=$(kubectl get svc -n envoy-gateway-system \
126+
envoy-aibrix-system-aibrix-eg-903790dc \
127+
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
128+
129+
ENDPOINT="${LB_IP}:80"
130+
echo "Gateway endpoint: http://${ENDPOINT}"
131+
```
132+
133+
> The service name may vary, use `kubectl get svc -n envoy-gateway-system` to confirm.
134+
135+
#### Send a Disaggregated Request
136+
137+
```bash
138+
curl -v "http://${ENDPOINT}/v1/chat/completions" \
139+
-H "routing-strategy: pd" \
140+
-H "Content-Type: application/json" \
141+
-d '{
142+
"model": "/models/Qwen2.5-7B-Instruct",
143+
"messages": [{"role": "user", "content": "Say this is a test!"}],
144+
"temperature": 0.7
145+
}'
146+
```
147+
148+
> **Requirements**:
149+
> - The `model` field must exactly match the label `model.aibrix.ai/name` in your deployment.
150+
> - The `routing-strategy: pd` header **must be present** to enable P/D splitting.

samples/disaggregation/vllm/README.md

Lines changed: 79 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,28 +2,64 @@
22

33
Note:
44
- The examples in this directory are for demonstration purposes only. Feel free to use your own model path, rdma network, image etc instead.
5-
- Routing solution in the examples uses vllm upstream sample router `disagg_proxy_server.py` images. AIBrix Envoy Routing examples will be updated shortly.
5+
- Two routing strategies are supported:
6+
- Static routing using the upstream vLLM sample router (disagg_proxy_server.py) — suitable for quick testing.
7+
- Dynamic routing via AIBrix Envoy Gateway with `routing-strategy: pd`.
68
- In order to use this example, you need to build your own vLLM image with Nixl. Guidance is provided.
79
- AIBrix storm service support replica mode and pool mode, please refer to [AIBrix storm service]( mode, please refer to [AIBrix storm service](https://aibrix.readthedocs.io/latest/designs/aibrix-stormservice.html) for more details.
810
- `vke.volcengine.com/rdma`, `k8s.volcengine.com/pod-networks` and `NCCL_IB_GID_INDEX` are specific to Volcano Engine Cloud. Feel free to customize it for your own cluster.
911

1012
## Configuration
1113

12-
### Build vLLM images with Nixl
14+
### Build vLLM images with Nixl Support
1315

1416
```Dockerfile
15-
FROM vllm/vllm-openai:v0.9.2
17+
# Build arguments to enable flexible base image and nixl version.
18+
# Override at build time with --build-arg if needed.
1619

17-
RUN pip install nixl==0.4.1
20+
# Default vLLM OpenAI-compatible image tag.
21+
# Available tags: https://hub.docker.com/r/vllm/vllm-openai/tags
22+
ARG VLLM_BASE_TAG="v0.9.2"
23+
24+
# Optional nixl version. If left empty, the latest version will be installed.
25+
ARG NIXL_VER=""
26+
27+
# Base image from official vLLM project.
28+
# Available tags can be found at:
29+
# https://hub.docker.com/r/vllm/vllm-openai/tags
30+
# This image is also mirrored/extended under:
31+
# https://hub.docker.com/r/aibrix/vllm-openai/tags
32+
FROM vllm/vllm-openai:${VLLM_BASE_TAG}
33+
34+
# Install nixl package:
35+
# - If NIXL_VER is specified, install exact version: nixl==<version>
36+
# - Otherwise, install the latest available version.
37+
RUN if [ -n "${NIXL_VER}" ]; then \
38+
echo "Installing nixl==${NIXL_VER}..."; \
39+
pip install "nixl==${NIXL_VER}"; \
40+
else \
41+
echo "Installing latest nixl..."; \
42+
pip install nixl; \
43+
fi
1844
```
1945

2046
```bash
21-
docker build -t aibrix/vllm-openai:v0.9.2-cu128-nixl-v0.4.1
47+
# 1. Build with default vLLM tag and pinned nixl version
48+
docker build \
49+
--build-arg NIXL_VER=0.4.1 \
50+
-t aibrix/vllm-openai:v0.9.2-nixl-v0.4.1 .
51+
52+
# 2. Build with custom vLLM tag and latest nixl
53+
docker build \
54+
--build-arg VLLM_BASE_TAG=v0.10.2 \
55+
-t aibrix/vllm-openai:v0.10.0-nixl-latest .
2256
```
2357

2458
> Note: sample router has been included in the image. We do not need additional steps to build it.
2559
26-
## Start the Router
60+
## Option 1: Static Routing with disagg_proxy_server.py (Legacy)
61+
62+
### Start the Router
2763

2864
Currently, the router is very simple and it relies on user to pass the prefill and decode IPs.
2965
Launch the router `kubectl apply -f router.yaml`, ssh to the pod and launch the process.
@@ -64,3 +100,40 @@ curl http://localhost:8000/v1/completions \
64100
"temperature": 0
65101
}'
66102
```
103+
---
104+
105+
## Option 2: Dynamic P/D Routing with AIBrix Gateway
106+
107+
AIBrix now supports **native Prefill/Decode disaggregation** through its Envoy Gateway integration. By setting the `routing-strategy: pd` header, requests are automatically split between prefill and decode pods for optimal performance.
108+
109+
110+
### Get Gateway Endpoint
111+
112+
```bash
113+
LB_IP=$(kubectl get svc -n envoy-gateway-system \
114+
envoy-aibrix-system-aibrix-eg-903790dc \
115+
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
116+
117+
ENDPOINT="${LB_IP}:80"
118+
echo "Gateway endpoint: http://${ENDPOINT}"
119+
```
120+
121+
> The service name may vary, use `kubectl get svc -n envoy-gateway-system` to confirm.
122+
123+
### Send a Disaggregated Request
124+
125+
```bash
126+
curl -v "http://${ENDPOINT}/v1/chat/completions" \
127+
-H "routing-strategy: pd" \
128+
-H "Content-Type: application/json" \
129+
-d '{
130+
"model": "qwen3-8B",
131+
"messages": [{"role": "user", "content": "Say this is a test!"}],
132+
"temperature": 0.7
133+
}'
134+
```
135+
136+
> **Requirements**:
137+
> - The `model` field must exactly match the label `model.aibrix.ai/name` in your deployment.
138+
> - The `routing-strategy: pd` header **must be present** to enable P/D splitting.
139+

0 commit comments

Comments
 (0)