【xinference】（19）：在L40设备上通过Xinference框架，快速部署CogVideoX-5b模型，可以生成6秒视频，速度比409D快一点

news2025/7/15 9:37:35

1，关于Xinference

Xorbits Inference (Xinference) 是一个开源平台，用于简化各种 AI 模型的运行和集成。借助 Xinference，您可以使用任何开源 LLM、嵌入模型和多模态模型在云端或本地环境中运行推理，并创建强大的 AI 应用。

https://inference.readthedocs.io/zh-cn/latest/

最新版本已经支持了cogvideox-5b模型了：

https://www.modelscope.cn/models/zhipuai/cogvideox-5b

【xinference】（19）：在L40设备上通过Xinference框架，快速部署CogVideoX-5b模型，可以生成6秒视频，速度快一点

2，使用方法


sudo apt install -y python3-pip
pip3 install xinference diffusers imageio imageio-ffmpeg 

# 设置国内模型地址，下载速度快 10+MB/S
export XINFERENCE_MODEL_SRC=modelscope
export XINFERENCE_HOME=`pwd`/xinf-data

# 首先启动 xinference-local ：
# CUDA_VISIBLE_DEVICES=0,1,2 
xinference-local --host 0.0.0.0

可以不用增加参数：

xinference launch --model-name CogVideoX-5b --model-type video

3，运行监控，显存占用到了40G

在这里插入图片描述

生成速度的：

from xinference.client import Client
import base64
import time
client = Client("http://0.0.0.0:6006")

model = client.get_model("CogVideoX-5b")
input_text = "an apple"
out = model.text_to_video(input_text)

video_data = base64.b64decode(out['data'][0]['b64_json'])
with open('./'+str(time.time())+'.mp4', 'wb') as fout:
      fout.write(video_data)