Skip to content

Commit f3be4b9

Browse files
authored
Add vLLM Lora testing scripts (#1829)
* Add vLLM Lora testing scripts * Address code review feedback --------- Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
1 parent 5955b53 commit f3be4b9

File tree

8 files changed

+2925
-0
lines changed

8 files changed

+2925
-0
lines changed
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# vLLM LoRA Feature Test Suite
2+
3+
This test suite validates vLLM's LoRA adapter memory management, including LRU eviction, CPU-GPU swapping, pinning, and API behaviors.
4+
5+
## Prerequisites
6+
7+
- Python 3.10+
8+
- vLLM installed with LoRA support
9+
- GPU with sufficient memory (recommended: 24GB+ for Qwen3-8B)
10+
- Required packages: `pytest`, `pytest-asyncio`, `torch`, `safetensors`, `transformers`, `vllm`
11+
12+
```bash
13+
pip install pytest pytest-asyncio torch safetensors transformers vllm
14+
```
15+
16+
## Quick Start
17+
18+
```bash
19+
# 1. Generate test LoRA adapters
20+
python create_test_loras.py --base-model Qwen/Qwen3-8B --num-loras 10 --ranks 8,16,32
21+
22+
# 2. Run all tests
23+
python run_all_tests.py
24+
```
25+
26+
## Test Coverage
27+
28+
The test suite validates:
29+
- LRU eviction behavior
30+
- CPU-GPU swap latency
31+
- LoRA pinning functionality
32+
- Memory isolation between LoRA and KV cache
33+
- Memory budget and pre-allocation
34+
- Batching constraints with max_loras
35+
- Concurrent request handling
36+
- API idempotency
37+
38+
39+
## Artifact Preparation
40+
41+
### Option 1: Generate Dummy LoRAs (Recommended for Testing)
42+
43+
Generate dummy LoRA adapters that match your base model's architecture:
44+
45+
```bash
46+
# Basic usage - creates 10 LoRAs with rank 8
47+
python create_test_loras.py --base-model Qwen/Qwen3-8B
48+
49+
# Multiple ranks for swap latency testing
50+
python create_test_loras.py --base-model Qwen/Qwen3-8B --num-loras 10 --ranks 8,16,32
51+
52+
# Include MLP modules for larger LoRAs
53+
python create_test_loras.py --base-model Qwen/Qwen3-8B \
54+
--target-modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
55+
56+
# Custom output directory
57+
python create_test_loras.py --base-model Qwen/Qwen3-8B \
58+
--output-dir /path/to/loras
59+
```
60+
61+
#### Estimate LoRA Sizes
62+
63+
```bash
64+
python create_test_loras.py --base-model Qwen/Qwen3-8B --ranks 8,16,32,64 --estimate
65+
```
66+
67+
#### Verify Generated Adapters
68+
69+
```bash
70+
python create_test_loras.py --verify --output-dir /tmp/test_loras
71+
```
72+
73+
### Option 2: Use Real LoRA from HuggingFace
74+
75+
Edit `test_config.py` to use a real LoRA adapter:
76+
77+
```python
78+
# In test_config.py
79+
LORA_MODEL = "your-org/your-lora-adapter"
80+
```
81+
82+
The test suite automatically falls back to `LORA_MODEL` if generated LoRAs are not found.
83+
84+
## Running Tests
85+
86+
### Run All Tests
87+
88+
```bash
89+
python run_all_tests.py
90+
```
91+
92+
Options:
93+
- `--timeout 600` - Timeout per test file in seconds (default: 600)
94+
- `--quiet` - Reduce output verbosity
95+
96+
### Run Specific Tests with pytest
97+
98+
```bash
99+
# All tests in a file
100+
pytest test_lru_swap_pinning.py -v -s
101+
102+
# Specific test class
103+
pytest test_memory.py::TestMemoryBudget -v -s
104+
105+
# Specific test method
106+
pytest test_api.py::TestDynamicLoading::test_reload_same_lora -v -s
107+
108+
# Run tests matching a keyword
109+
pytest -v -s -k "eviction"
110+
```
111+
112+
113+
## Configuration
114+
115+
Edit `test_config.py` to customize:
116+
117+
```python
118+
# Base model
119+
BASE_MODEL = "Qwen/Qwen3-8B"
120+
121+
# Generated LoRA directory
122+
LORA_BASE_PATH = "/tmp/test_loras"
123+
124+
# Fallback LoRA (if generated not found)
125+
LORA_MODEL = "mtzig/qwen3-8b-tfdark-lora2"
126+
127+
# Test parameters
128+
TEST_PROMPT = "Hello, how are you today?"
129+
MAX_TOKENS = 50
130+
```
131+
132+
## Troubleshooting
133+
134+
### LoRA Loading Fails
135+
136+
1. Verify LoRA adapters exist: `python create_test_loras.py --verify`
137+
2. Check base model compatibility
138+
3. Ensure `max_lora_rank` >= actual LoRA rank

0 commit comments

Comments
 (0)