|
2 | 2 |
|
3 | 3 | Note: |
4 | 4 | - The examples in this directory are for demonstration purposes only. Feel free to use your own model path, rdma network, image etc instead. |
5 | | -- Routing solution in the examples uses vllm upstream sample router `disagg_proxy_server.py` images. AIBrix Envoy Routing examples will be updated shortly. |
| 5 | +- Two routing strategies are supported: |
| 6 | + - Static routing using the upstream vLLM sample router (disagg_proxy_server.py) — suitable for quick testing. |
| 7 | + - Dynamic routing via AIBrix Envoy Gateway with `routing-strategy: pd`. |
6 | 8 | - In order to use this example, you need to build your own vLLM image with Nixl. Guidance is provided. |
7 | 9 | - AIBrix storm service support replica mode and pool mode, please refer to [AIBrix storm service]( mode, please refer to [AIBrix storm service](https://aibrix.readthedocs.io/latest/designs/aibrix-stormservice.html) for more details. |
8 | 10 | - `vke.volcengine.com/rdma`, `k8s.volcengine.com/pod-networks` and `NCCL_IB_GID_INDEX` are specific to Volcano Engine Cloud. Feel free to customize it for your own cluster. |
9 | 11 |
|
10 | 12 | ## Configuration |
11 | 13 |
|
12 | | -### Build vLLM images with Nixl |
| 14 | +### Build vLLM images with Nixl Support |
13 | 15 |
|
14 | 16 | ```Dockerfile |
15 | | -FROM vllm/vllm-openai:v0.9.2 |
| 17 | +# Build arguments to enable flexible base image and nixl version. |
| 18 | +# Override at build time with --build-arg if needed. |
16 | 19 |
|
17 | | -RUN pip install nixl==0.4.1 |
| 20 | +# Default vLLM OpenAI-compatible image tag. |
| 21 | +# Available tags: https://hub.docker.com/r/vllm/vllm-openai/tags |
| 22 | +ARG VLLM_BASE_TAG="v0.9.2" |
| 23 | + |
| 24 | +# Optional nixl version. If left empty, the latest version will be installed. |
| 25 | +ARG NIXL_VER="" |
| 26 | + |
| 27 | +# Base image from official vLLM project. |
| 28 | +# Available tags can be found at: |
| 29 | +# https://hub.docker.com/r/vllm/vllm-openai/tags |
| 30 | +# This image is also mirrored/extended under: |
| 31 | +# https://hub.docker.com/r/aibrix/vllm-openai/tags |
| 32 | +FROM vllm/vllm-openai:${VLLM_BASE_TAG} |
| 33 | + |
| 34 | +# Install nixl package: |
| 35 | +# - If NIXL_VER is specified, install exact version: nixl==<version> |
| 36 | +# - Otherwise, install the latest available version. |
| 37 | +RUN if [ -n "${NIXL_VER}" ]; then \ |
| 38 | + echo "Installing nixl==${NIXL_VER}..."; \ |
| 39 | + pip install "nixl==${NIXL_VER}"; \ |
| 40 | + else \ |
| 41 | + echo "Installing latest nixl..."; \ |
| 42 | + pip install nixl; \ |
| 43 | + fi |
18 | 44 | ``` |
19 | 45 |
|
20 | 46 | ```bash |
21 | | -docker build -t aibrix/vllm-openai:v0.9.2-cu128-nixl-v0.4.1 |
| 47 | +# 1. Build with default vLLM tag and pinned nixl version |
| 48 | +docker build \ |
| 49 | + --build-arg NIXL_VER=0.4.1 \ |
| 50 | + -t aibrix/vllm-openai:v0.9.2-nixl-v0.4.1 . |
| 51 | + |
| 52 | +# 2. Build with custom vLLM tag and latest nixl |
| 53 | +docker build \ |
| 54 | + --build-arg VLLM_BASE_TAG=v0.10.2 \ |
| 55 | + -t aibrix/vllm-openai:v0.10.0-nixl-latest . |
22 | 56 | ``` |
23 | 57 |
|
24 | 58 | > Note: sample router has been included in the image. We do not need additional steps to build it. |
25 | 59 |
|
26 | | -## Start the Router |
| 60 | +## Option 1: Static Routing with disagg_proxy_server.py (Legacy) |
| 61 | + |
| 62 | +### Start the Router |
27 | 63 |
|
28 | 64 | Currently, the router is very simple and it relies on user to pass the prefill and decode IPs. |
29 | 65 | Launch the router `kubectl apply -f router.yaml`, ssh to the pod and launch the process. |
@@ -64,3 +100,40 @@ curl http://localhost:8000/v1/completions \ |
64 | 100 | "temperature": 0 |
65 | 101 | }' |
66 | 102 | ``` |
| 103 | +--- |
| 104 | + |
| 105 | +## Option 2: Dynamic P/D Routing with AIBrix Gateway |
| 106 | + |
| 107 | +AIBrix now supports **native Prefill/Decode disaggregation** through its Envoy Gateway integration. By setting the `routing-strategy: pd` header, requests are automatically split between prefill and decode pods for optimal performance. |
| 108 | + |
| 109 | + |
| 110 | +### Get Gateway Endpoint |
| 111 | + |
| 112 | +```bash |
| 113 | +LB_IP=$(kubectl get svc -n envoy-gateway-system \ |
| 114 | + envoy-aibrix-system-aibrix-eg-903790dc \ |
| 115 | + -o jsonpath='{.status.loadBalancer.ingress[0].ip}') |
| 116 | + |
| 117 | +ENDPOINT="${LB_IP}:80" |
| 118 | +echo "Gateway endpoint: http://${ENDPOINT}" |
| 119 | +``` |
| 120 | + |
| 121 | +> The service name may vary, use `kubectl get svc -n envoy-gateway-system` to confirm. |
| 122 | +
|
| 123 | +### Send a Disaggregated Request |
| 124 | + |
| 125 | +```bash |
| 126 | +curl -v "http://${ENDPOINT}/v1/chat/completions" \ |
| 127 | + -H "routing-strategy: pd" \ |
| 128 | + -H "Content-Type: application/json" \ |
| 129 | + -d '{ |
| 130 | + "model": "qwen3-8B", |
| 131 | + "messages": [{"role": "user", "content": "Say this is a test!"}], |
| 132 | + "temperature": 0.7 |
| 133 | + }' |
| 134 | +``` |
| 135 | + |
| 136 | +> **Requirements**: |
| 137 | +> - The `model` field must exactly match the label `model.aibrix.ai/name` in your deployment. |
| 138 | +> - The `routing-strategy: pd` header **must be present** to enable P/D splitting. |
| 139 | +
|
0 commit comments