Windows系统下MMDeploy预编译包的使用
MMDeploy步入v1版本后安装/使用难度大幅下降,这里以部署MMDetection项目的Faster R-CNN模型为例,将PyTorch模型转换为ONNX进而转换为Engine模型,部署到TensorRT后端,实现高效推理,主要参考了官方文档。
说明:制作本教程时,MMDeploy版本是v1.2.0
本机环境
-
Windows 11
-
Powershell 7
-
Visual Studio 2019
-
CUDA版本:11.7
-
CUDNN版本:8.6
-
Python版本:3.8
-
PyTorch版本:1.13.1
-
TensorRT版本:v8.5.3.1
-
mmdeploy版本:v1.2.0
-
mmdet版本:v3.0.0
1. 准备环境
每一步网上教程比较多,不多描述
-
安装
Visual Studio 2019
,勾选C++桌面开发,一定要选中Win10 SDK,貌似现在还不支持VS2022
-
安装CUDA&CUDNN
- 注意版本对应关系
- 一定要先安装VS2019,否则
visual studio Integration
无法安装成功,后面会报错 - 默认安装选项即可,如果不是默认安装,一定要勾选
visual studio Integration
-
Anaconda3/MiniConda3
安装完毕后,创建一个环境
conda create -n faster-rcnn-deploy python=3.8 -y conda activate faster-rcnn-deploy
-
安装GPU版本的PyTorch
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
-
安装OpenCV-Python
pip install opencv-python
2. 安装TensorRT
登录官网下载即可,这里直接给出我用的链接
https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/8.5.3/zip/TensorRT-8.5.3.1.Windows10.x86_64.cuda-11.8.cudnn8.6.zip
下载完成后,解压,进入解压的文件夹
-
新建一个用户/系统变量
TENSORRT_DIR
,值为当前目录 -
然后重启powershell,激活环境,此时可用
$env:TENSORRT
访问TensorRT安装目录 -
将
$env:TENSORRT_DIR\lib
加入PATH路径 -
然后重启powershell,激活环境
-
安装对应python版本的wheel包
pip install $env:TENSORRT_DIR\python\tensorrt-8.5.3.1-cp38-none-win_amd64.whl
-
安装pycuda
pip install pycuda
3. 安装mmdeploy及runtime
-
mmdeploy:模型转换API
-
runtime:模型推理API
pip install mmdeploy==1.2.0 pip install mmdeploy-runtime-gpu==1.2.0
4. 克隆MMDeploy仓库
新建一个文件夹,后面所有的仓库/文件均放在此目录下
克隆mmdeploy仓库主要是需要用到里面的配置文件
git clone -b main https://github.com/open-mmlab/mmdeploy.git
5. 安装MMDetection
需要先安装MMCV:
pip install -U openmim
mim install "mmcv>=2.0.0rc2"
克隆并编译安装mmdet:
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
git checkout v3.0.0
pip install -v -e .
cd ..
4. 进行转换
文件目录如下:
./faster-rcnn-deploy/
├── app.py
├── checkpoints
├── convert.py
├── infer.py
├── mmdeploy
├── mmdeploy_model
├── mmdetection
├── output_detection.png
└── tmp.py
-
部署配置文件:
mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py
-
模型配置文件:
mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
-
模型权重文件:
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
,这里是用的openmmlab训练好的权重,粘贴到浏览器,或者可以通过windows下的 wget 下载:wget -P checkpoints https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
-
测试图片文件:
mmdetection/demo/demo.jpg
-
保存目录:
mmdeploy_model/faster-rcnn-deploy-fp16
convert.py
内容如下:
from mmdeploy.apis import torch2onnx
from mmdeploy.apis.tensorrt import onnx2tensorrt
from mmdeploy.backend.sdk.export_info import export2SDK
import os
img = "mmdetection/demo/demo.jpg"
work_dir = "mmdeploy_model/faster-rcnn-deploy-fp16"
save_file = "end2end.onnx"
deploy_cfg = "mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py"
model_cfg = "mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py"
model_checkpoint = "checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth"
device = "cuda"
# 1. convert model to IR(onnx)
torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg, model_checkpoint, device)
# 2. convert IR to tensorrt
onnx_model = os.path.join(work_dir, save_file)
save_file = "end2end.engine"
model_id = 0
device = "cuda"
onnx2tensorrt(work_dir, save_file, model_id, deploy_cfg, onnx_model, device)
# 3. extract pipeline info for sdk use (dump-info)
export2SDK(deploy_cfg, model_cfg, work_dir, pth=model_checkpoint, device=device)
运行结果:
[08/30/2023-17:36:13] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +84, GPU +109, now: CPU 84, GPU 109 (MiB)
5. 推理测试
infer.py
内容如下:
from mmdeploy.apis import inference_model
deploy_cfg = "mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py"
model_cfg = "mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py"
backend_files = ["mmdeploy_model/faster-rcnn-fp16/end2end.engine"]
img = "mmdetection/demo/demo.jpg"
device = "cuda"
result = inference_model(model_cfg, deploy_cfg, backend_files, img, device)
print(result)
运行结果:
08/30 17:42:43 - mmengine - INFO - Successfully loaded tensorrt plugins from F:\miniconda3\envs\faster-rcnn-deploy\lib\site-packages\mmdeploy\lib\mmdeploy_tensorrt_ops.dll
08/30 17:42:43 - mmengine - INFO - Successfully loaded tensorrt plugins from F:\miniconda3\envs\faster-rcnn-deploy\lib\site-packages\mmdeploy\lib\mmdeploy_tensorrt_ops.dll
...
...
inference_model
每调用一次就会加载一次模型,效率很低,只是用来测试模型可用性,不能用在生产环境。要高效使用模型,可以集成Detector到自己的应用程序里面,一次加载,多次推理。如下:
6. 集成检测器到自己的应用中
app.py
内容如下:
from mmdeploy_runtime import Detector
import cv2
# 读取图片
img = cv2.imread("mmdetection/demo/demo.jpg")
# 创建检测器
detector = Detector(
model_path="mmdeploy_model/faster-rcnn-deploy-fp16",
device_name="cuda",
device_id=0,
)
# 执行推理
bboxes, labels, _ = detector(img)
# 使用阈值过滤推理结果,并绘制到原图中
indices = [i for i in range(len(bboxes))]
for index, bbox, label_id in zip(indices, bboxes, labels):
[left, top, right, bottom], score = bbox[0:4].astype(int), bbox[4]
if score < 0.3:
continue
cv2.rectangle(img, (left, top), (right, bottom), (0, 255, 0))
cv2.imwrite("output_detection.png", img)
调用这个API可以将训练的深度学习模型无缝集成到web后端里面,一次加载,多次推理
原图:
推理检测后: