Skip to content

Commit 745b739

Browse files
committed
first draft
Signed-off-by: guangli.bao <[email protected]>
1 parent a4bdbb5 commit 745b739

File tree

4 files changed

+112
-0
lines changed

4 files changed

+112
-0
lines changed

docs/assets/sample-output1.png

400 KB
Loading

docs/assets/sample-output2.png

268 KB
Loading

docs/assets/sample-output3.png

169 KB
Loading
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# GuideLLM Benchmark Testing Best Practice
2+
3+
Do first easy-go guidellm benchmark testing from scratch using vLLM Simulator.
4+
5+
## Getting Started
6+
7+
### 📦 1. Benchmark Testing Environment Setup
8+
9+
#### 1.1 Create a Conda Environment (recommended)
10+
11+
```bash
12+
conda create -n guidellm-bench python=3.11 -y
13+
conda activate guidellm-bench
14+
```
15+
16+
#### 1.2 Install Dependencies
17+
18+
```bash
19+
git clone https://github.com/vllm-project/guidellm.git
20+
cd guidellm
21+
pip install guidellm
22+
```
23+
24+
For more detailed instructions, refer to [GuideLLM README](https://github.com/vllm-project/guidellm/blob/main/README.md).
25+
26+
#### 1.3 Verify Installation
27+
28+
```bash
29+
guidellm --help
30+
```
31+
32+
#### 1.4 Startup OpenAI-compatible API in vLLM simulator docker container
33+
34+
```bash
35+
docker pull ghcr.io/llm-d/llm-d-inference-sim:v0.4.0
36+
37+
docker run --rm --publish 8000:8000 \
38+
ghcr.io/llm-d/llm-d-inference-sim:v0.4.0 \
39+
--port 8000 \
40+
--model "Qwen/Qwen2.5-1.5B-Instruct" \
41+
--lora-modules '{"name":"tweet-summary-0"}' '{"name":"tweet-summary-1"}'
42+
```
43+
44+
For more detailed instructions, refer to: [vLLM Simulator](https://llm-d.ai/docs/architecture/Components/inference-sim)
45+
46+
Docker image versions: [Docker Images](https://github.com/llm-d/llm-d-inference-sim/pkgs/container/llm-d-inference-sim)
47+
48+
Check open-ai api working via curl:
49+
50+
- check /v1/models
51+
52+
```bash
53+
curl --request GET 'http://localhost:8000/v1/models'
54+
```
55+
56+
- check /v1/chat/completions
57+
58+
```bash
59+
curl --request POST 'http://localhost:8000/v1/chat/completions' \
60+
--header 'Content-Type: application/json' \
61+
--data-raw '{
62+
"model": "tweet-summary-0",
63+
"stream": false,
64+
"messages": [{"role": "user", "content": "Say this is a test!"}]
65+
}'
66+
```
67+
68+
- check /v1/completions
69+
70+
```bash
71+
curl --request POST 'http://localhost:8000/v1/completions' \
72+
--header 'Content-Type: application/json' \
73+
--data-raw '{
74+
"model": "tweet-summary-0",
75+
"stream": false,
76+
"prompt": "Say this is a test!",
77+
"max_tokens": 128
78+
}'
79+
```
80+
81+
#### 1.5 Download Tokenizer
82+
83+
Download Qwen/Qwen3-0.6B tokenizer.json from [Qwen/Qwen3-0.6B](https://modelscope.cn/models/Qwen/Qwen3-0.6B/files) save to local path.
84+
85+
______________________________________________________________________
86+
87+
## 🚀 2. Running Benchmarks
88+
89+
```bash
90+
guidellm benchmark \
91+
--target "http://localhost:8000/" \
92+
--model "tweet-summary-0" \
93+
--processor "${local_path}/Qwen3-0.6B" \
94+
--rate-type sweep \
95+
--max-seconds 10 \
96+
--max-requests 10 \
97+
--data "prompt_tokens=128,output_tokens=56"
98+
```
99+
100+
______________________________________________________________________
101+
102+
## 📊 3. Results Interpretation
103+
104+
![alt text](../assets/sample-output1.png) ![alt text](../assets/sample-output2.png) ![alt text](../assets/sample-output3.png)
105+
106+
After the benchmark completes, key results are clear and straightforward, such as:
107+
108+
- **`TTFT`**: Time to First Token
109+
- **`TPOT`**: Time Per Output Token
110+
- **`ITL`**: Inter-Token Latency
111+
112+
The first benchmark test complete.

0 commit comments

Comments
 (0)