【使用Hey对vllm接口压测】模型并发能力

news2025/2/23 21:48:13

使用Hey对vllm进行模型并发压测

在这里插入图片描述

docker run --rm --network=knowledge_network \
    registry.cn-shanghai.aliyuncs.com/zhph-server/hey:latest \
    -n 200 -c 200 -m POST -H "Content-Type: application/json" \
    -H "Authorization: xxx" \
    -d '{
        "model": "codechat",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello!"}
        ],
        "stream": false,
        "max_tokens": 100,
        "temperature": 0.0
    }' http://vllm-openai:80/v1/chat/completions

docker run --rm --network=knowledge_network \
    registry.cn-shanghai.aliyuncs.com/zhph-server/hey:latest \
    -n 200 -c 200 -m POST -H "Content-Type: application/json" \
    -H "Authorization: xxx" \
    -d '{
        "model": "codebase",
        "prompt": "# write a python code to print hello world",
        "stream": false,
        "max_tokens": 100,
        "temperature": 0.5
    }' http://vllm-openai:80/v1/completions

结果

Summary:                                                                                                                                                                 
  Total:        2.2220 secs                                                                                                                                              
  Slowest:      1.3603 secs                                                                                                                                              
  Fastest:      0.7641 secs                                                                                                                                              
  Average:      1.0815 secs                                                                                                                                              
  Requests/sec: 43.2034                                                                                                                                                  
                                                                                                                                                                         
  Total data:   28992 bytes                                                                                                                                              
  Size/request: 302 bytes                                                                                                                                                
                                                                                                                                                                         
Response time histogram:                                                                                                                                                 
  0.764 [1]     |■                                                                                                                                                       
  0.824 [5]     |■■■■■■■                                                                                                                                                 
  0.883 [4]     |■■■■■■                                                                                                                                                  
  0.943 [7]     |■■■■■■■■■■                                                                                                                                              
  1.003 [11]    |■■■■■■■■■■■■■■■■                                                                                                                                        
  1.062 [7]     |■■■■■■■■■■                                                                                                                                              
  1.122 [28]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■                                                                                                                
  1.181 [7]     |■■■■■■■■■■                                                                                                                                              
  1.241 [9]     |■■■■■■■■■■■■■                                                                                                                                           
  1.301 [9]     |■■■■■■■■■■■■■                                                                                                                                           
  1.360 [8]     |■■■■■■■■■■■                                                                                                                                             
                                                                                                                                                                         
                                                                                                                                                                         
Latency distribution:                                                                                                                                                    
  10% in 0.9175 secs                                                                                                                                                     
  25% in 0.9570 secs                                                                                                                                                     
  50% in 1.0721 secs                                                                                                                                                     
  75% in 1.2131 secs                                                                                                                                                     
  90% in 1.2790 secs                                                                                                                                                     
  95% in 1.3599 secs                                                                                                                                                     
  0% in 0.0000 secs                                                                                                                                                      
                                                                                                                                                                         
Details (average, fastest, slowest):                                                                                                                                     
  DNS+dialup:   0.0036 secs, 0.7641 secs, 1.3603 secs                                                                                                                    
  DNS-lookup:   0.0013 secs, 0.0000 secs, 0.0075 secs                                                                                                                    
  req write:    0.0003 secs, 0.0000 secs, 0.0051 secs                                                                                                                    
  resp wait:    1.0774 secs, 0.7640 secs, 1.3533 secs                                                                                                                    
  resp read:    0.0001 secs, 0.0000 secs, 0.0002 secs                                                                                                                    
                                                                                                                                                                         
Status code distribution:                                                                                                                                                
  [200] 96 responses