Skip to content

Commit 52ef473

Browse files
committed
Merge remote-tracking branch 'upstream/develop' into excel_develop
2 parents 7643b70 + 12ddd17 commit 52ef473

File tree

3 files changed

+228
-0
lines changed

3 files changed

+228
-0
lines changed

.github/workflows/cpp-linter.yml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: cpp-linter
2+
3+
on:
4+
push:
5+
branches: [ "*" ]
6+
pull_request:
7+
branches: [ "dev*", "main", "*release" ]
8+
9+
10+
jobs:
11+
cpp-linter:
12+
runs-on: ubuntu-latest
13+
steps:
14+
- uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
15+
with:
16+
persist-credentials: false
17+
- uses: cpp-linter/cpp-linter-action@main
18+
id: linter
19+
continue-on-error: true
20+
env:
21+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
22+
with:
23+
style: file
24+
tidy-checks: '-*'
25+
files-changed-only: true
26+
lines-changed-only: diff
27+
format-review: true
28+
thread-comments: ${{ github.event_name == 'pull_request' && 'update' }}
29+
30+
- name: Fail fast?!
31+
if: steps.linter.outputs.checks-failed != 0
32+
run: |
33+
echo "some linter checks failed. ${{ steps.linter.outputs.checks-failed }}"
34+
exit 1

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ getting-started/installation_npu
5757
user-guide/prefix-cache/index
5858
user-guide/sparse-attention/index
5959
user-guide/pd-disaggregation/index
60+
user-guide/metrics/metrics
6061
:::
6162

6263
:::{toctree}
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# Observability
2+
3+
UCM (Unified Cache Management) provides detailed metrics monitoring through Prometheus endpoints, allowing in-depth monitoring of cache performance and behavior. This document describes how to enable and configure observability from the embedded vLLM `/metrics` API endpoint.
4+
5+
---
6+
7+
## Quick Start Guide
8+
9+
### 1) On UCM Side
10+
11+
First, set the `PROMETHEUS_MULTIPROC_DIR` environment variable.
12+
13+
```bash
14+
export PROMETHEUS_MULTIPROC_DIR=/vllm-workspace
15+
```
16+
17+
Then, start the UCM service.
18+
19+
```bash
20+
export CUDA_VISIBLE_DEVICES=0
21+
vllm serve /home/models/Qwen2.5-14B-Instruct \
22+
--max-model-len 5000 \
23+
--tensor-parallel-size 1 \
24+
--gpu_memory_utilization 0.87 \
25+
--trust-remote-code \
26+
--disable-log-requests \
27+
--no-enable-prefix-caching \
28+
--enforce-eager \
29+
--max-num-batched-tokens 40000 \
30+
--max-num-seqs 10 \
31+
--host 0.0.0.0 \
32+
--port 8000 \
33+
--kv-transfer-config \
34+
'{
35+
"kv_connector": "UCMConnector",
36+
"kv_connector_module_path": "ucm.integration.vllm.ucm_connector",
37+
"kv_role": "kv_both",
38+
"kv_connector_extra_config": {
39+
"UCM_CONFIG_FILE": "/vllm-workspace/unified-cache-management/examples/ucm_config.yaml"
40+
}
41+
}'
42+
```
43+
**Note**: You can refer to the `ucm_config.yaml` file at https://github.com/ModelEngine-Group/unified-cache-management/tree/develop/examples to configure the `metrics_config_path` parameter.
44+
45+
You can use the `vllm bench serve` command to run benchmarks:
46+
47+
```bash
48+
vllm bench serve \
49+
--backend vllm \
50+
--model /home/models/Qwen2.5-14B-Instruct \
51+
--host 127.0.0.1 \
52+
--port 8000 \
53+
--dataset-name random \
54+
--num-prompts 20 \
55+
--random-input-len 200 \
56+
--random-output-len 10 \
57+
--request-rate 1 \
58+
--ignore-eos
59+
```
60+
61+
Once the HTTP server is running, you can access the UCM metrics at the `/metrics` endpoint.
62+
63+
```bash
64+
curl http://$<vllm-worker-ip>:8000/metrics | grep ucm:
65+
```
66+
67+
You will also find some `.db` files in the `$PROMETHEUS_MULTIPROC_DIR` directory, which are temporary files used by Prometheus.
68+
69+
### 2) Start Prometheus and Grafana with Docker Compose
70+
71+
#### Create Docker Compose Configuration Files
72+
73+
First, create the `docker-compose.yaml` file:
74+
75+
```yaml
76+
# docker-compose.yaml
77+
version: "3"
78+
79+
services:
80+
prometheus:
81+
image: prom/prometheus:latest
82+
extra_hosts:
83+
- "host.docker.internal:host-gateway"
84+
ports:
85+
- "9090:9090"
86+
volumes:
87+
- ${PWD}/prometheus.yaml:/etc/prometheus/prometheus.yml
88+
89+
grafana:
90+
image: grafana/grafana:latest
91+
depends_on:
92+
- prometheus
93+
ports:
94+
- "3000:3000"
95+
```
96+
97+
Then, create the `prometheus.yaml` configuration file:
98+
99+
```yaml
100+
# prometheus.yaml
101+
global:
102+
scrape_interval: 5s
103+
evaluation_interval: 30s
104+
105+
scrape_configs:
106+
- job_name: vllm
107+
static_configs:
108+
- targets:
109+
- 'host.docker.internal:8000'
110+
```
111+
112+
**Note**: Make sure the port number in `prometheus.yaml` matches the port number used when starting the vLLM service.
113+
114+
#### Start Services
115+
116+
Run the following command in the directory containing `docker-compose.yaml` and `prometheus.yaml`:
117+
118+
```bash
119+
docker compose up
120+
```
121+
122+
This will start Prometheus and Grafana services.
123+
124+
### 3) Configure Grafana Dashboard
125+
126+
#### Access Grafana
127+
128+
Navigate to `http://<your-host>:3000`. Log in with the default username (`admin`) and password (`admin`). You will be prompted to change the password on first login.
129+
130+
#### Add Prometheus Data Source
131+
132+
1. Navigate to `http://<your-host>:3000/connections/datasources/new` and select **Prometheus**.
133+
134+
2. On the Prometheus configuration page, add the Prometheus server URL in the **Connection** section. For this Docker Compose setup, Grafana and Prometheus run in separate containers, but Docker creates DNS names for each container. You can directly use `http://prometheus:9090`.
135+
136+
3. Click **Save & Test**. You should see a green checkmark showing "Successfully queried the Prometheus API."
137+
138+
#### Import Dashboard
139+
140+
1. Navigate to `http://<your-host>:3000/dashboard/import`.
141+
142+
2. Click **Upload JSON file**, then upload the `unified-cache-management/examples/metrics/grafana.json` file.
143+
144+
3. Select the Prometheus data source configured earlier.
145+
146+
4. Click **Import** to complete the import.
147+
148+
You should now be able to see the UCM monitoring dashboard with real-time visualization of all 9 metrics.
149+
150+
## Available Metrics
151+
152+
UCM exposes various metrics to monitor its performance. The following table lists all available metrics organized by category:
153+
154+
| Metric Name | Type | Description |
155+
|------------|------|-------------|
156+
| **Load Operation Metrics** | | |
157+
| `ucm:load_requests_num` | Histogram | Number of requests loaded per `start_load_kv` call |
158+
| `ucm:load_blocks_num` | Histogram | Number of blocks loaded per `start_load_kv` call |
159+
| `ucm:load_duration` | Histogram | Time to load KV cache from UCM (milliseconds) |
160+
| `ucm:load_speed` | Histogram | Speed of loading from UCM (GB/s) |
161+
| **Save Operation Metrics** | | |
162+
| `ucm:save_requests_num` | Histogram | Number of requests saved per `wait_for_save` call |
163+
| `ucm:save_blocks_num` | Histogram | Number of blocks saved per `wait_for_save` call |
164+
| `ucm:save_duration` | Histogram | Time to save to UCM (milliseconds) |
165+
| `ucm:save_speed` | Histogram | Speed of saving to UCM (GB/s) |
166+
| **Lookup Hit Rate Metrics** | | |
167+
| `ucm:interval_lookup_hit_rates` | Histogram | Hit rate of UCM lookup requests |
168+
169+
## Prometheus Configuration
170+
171+
Metrics configuration is defined in the `unified-cache-management/examples/metrics/metrics_configs.yaml` file:
172+
173+
```yaml
174+
log_interval: 5 # Interval in seconds for logging metrics
175+
176+
prometheus:
177+
multiproc_dir: "/vllm-workspace" # Prometheus directory
178+
metric_prefix: "ucm:" # Metric name prefix
179+
180+
enabled_metrics:
181+
counters: true
182+
gauges: true
183+
histograms: true
184+
185+
histograms:
186+
- name: "load_requests_num"
187+
documentation: "Number of requests loaded from ucm"
188+
buckets: [1, 5, 10, 20, 50, 100, 200, 500, 1000]
189+
# ... other metric configurations
190+
```
191+
192+
---
193+

0 commit comments

Comments
 (0)