本地运行 Qwen2-VL

news2026/2/11 5:39:47

本地运行 Qwen2-VL

1. 克隆代码
2. 创建虚拟环境
3. 安装依赖模块
4. 启动
5. 访问

1. 克隆代码

git clone https://github.com/QwenLM/Qwen2-VL.git
cd Qwen2-VL

2. 创建虚拟环境

conda create -n qwen2-vl python=3.11 -y
conda activate qwen2-vl

3. 安装依赖模块

pip install git+https://github.com/huggingface/transformers accelerate
pip install qwen-vl-utils
pip install deepspeed
pip install flash-attn --no-build-isolation
pip install git+https://github.com/huggingface/transformers.git
pip install einops==0.8.0
pip install git+https://github.com/fyabc/vllm.git@add_qwen2_vl_new

4. 启动

python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-VL-7B-Instruct --model Qwen/Qwen2-VL-7B-Instruct

5. 访问

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Qwen2-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ]
    }'

from openai import OpenAI

# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="Qwen2-7B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"
                    },
                },
                {"type": "text", "text": "What is the text in the illustrate?"},
            ],
        },
    ],
)
print("Chat response:", chat_response)

完结！

refer: