Python调用MMDetection实现AI抠图去背景

这篇文章的内容是以《使用MMDetection进行目标检测、实例和全景分割》为基础，需要安装好 MMDetection 的运行环境，同时完成目标检测、实例分割和全景分割的功能实践，之后再看下面的内容。

想要实现AI抠图去背景的需求，我们需要利用 OpenMMLab 项目 MMDetection 模型库中的实例分割（Instance Segmentation）模型，来帮我们完成最核心的分类、分割图片物体任务。

接下来的主要操作是在 demo.py 编写和调试代码，首先编写下面的代码：

import cv2
import mmcv
from copy import deepcopy
from mmdet.apis import init_detector, inference_detector

config_file = 'configs/scnet/scnet_r50_fpn_1x_coco.py'  # 配置文件路径
checkpoint_file = 'checkpoints/scnet/scnet_r50_fpn_1x_coco-c3f09857.pth'  # 预训练模型加载路径
device = 'cpu'  # cpu|gpu
model = init_detector(config_file, checkpoint_file, device=device)  # 构建模型
img_raw = mmcv.imread('demo/demo.jpg')  # 待推理图像的原始 numpy.ndarray 对象

img_h = img_raw.shape[0]  # 图像高度像素值
img_w = img_raw.shape[1]  # 图像宽度像素值

result = inference_detector(model, img_raw)  # 推理的结果
img_result = model.show_result(img_raw, result)  # 结果图像的 numpy.ndarray 对象
model.show_result(img_raw, result, out_file='demo/demo_result.jpg')  # 将结果图像另存为文件

执行 demo.py 文件后，项目下面会生成 demo/demo_result.jpg 文件：

demo_result_1676859284

之前，我们是通过另存为文件的方式，来可视化推理的结果，现在，我们要进一步的解析推理结果。由模型文件名称 scnet_r50_fpn_1x_coco-c3f09857.pth 可知，这是个使用 coco 数据集训练出来的预训练模型，所以我们需要用 coco 数据集的定义方式去解析推理结果。

在 demo.py 文件里继续编写，提前定义一个存储和解析推理结果的 coco_80_dict 字典：

coco_80_dict = {
    'person': {'index': 0, 'id': 1, 'describe': '人'},
    'bicycle': {'index': 1, 'id': 2, 'describe': '自行车'},
    'car': {'index': 2, 'id': 3, 'describe': '车'},
    'motorcycle': {'index': 3, 'id': 4, 'describe': '摩托车'},
    'airplane': {'index': 4, 'id': 5, 'describe': '飞机'},
    'bus': {'index': 5, 'id': 6, 'describe': '公交车'},
    'train': {'index': 6, 'id': 7, 'describe': '火车'},
    'truck': {'index': 7, 'id': 8, 'describe': '卡车'},
    'boat': {'index': 8, 'id': 9, 'describe': '船'},
    'traffic light': {'index': 9, 'id': 10, 'describe': '红绿灯'},
    'fire hydrant': {'index': 10, 'id': 11, 'describe': '消防栓'},
    'stop sign': {'index': 11, 'id': 13, 'describe': '停车标志'},
    'parking meter': {'index': 12, 'id': 14, 'describe': '停车收费表'},
    'bench': {'index': 13, 'id': 15, 'describe': '板凳'},
    'bird': {'index': 14, 'id': 16, 'describe': '鸟'},
    'cat': {'index': 15, 'id': 17, 'describe': '猫'},
    'dog': {'index': 16, 'id': 18, 'describe': '狗'},
    'horse': {'index': 17, 'id': 19, 'describe': '马'},
    'sheep': {'index': 18, 'id': 20, 'describe': '羊'},
    'cow': {'index': 19, 'id': 21, 'describe': '牛'},
    'elephant': {'index': 20, 'id': 22, 'describe': '大象'},
    'bear': {'index': 21, 'id': 23, 'describe': '熊'},
    'zebra': {'index': 22, 'id': 24, 'describe': '斑马'},
    'giraffe': {'index': 23, 'id': 25, 'describe': '长颈鹿'},
    'backpack': {'index': 24, 'id': 27, 'describe': '背包'},
    'umbrella': {'index': 25, 'id': 28, 'describe': '雨伞'},
    'handbag': {'index': 26, 'id': 31, 'describe': '手提包'},
    'tie': {'index': 27, 'id': 32, 'describe': '领带'},
    'suitcase': {'index': 28, 'id': 33, 'describe': '手提箱'},
    'frisbee': {'index': 29, 'id': 34, 'describe': '飞盘'},
    'skis': {'index': 30, 'id': 35, 'describe': '雪橇'},
    'snowboard': {'index': 31, 'id': 36, 'describe': '滑雪板'},
    'sports ball': {'index': 32, 'id': 37, 'describe': '运动球'},
    'kite': {'index': 33, 'id': 38, 'describe': '风筝'},
    'baseball bat': {'index': 34, 'id': 39, 'describe': '棒球棒'},
    'baseball glove': {'index': 35, 'id': 40, 'describe': '棒球手套'},
    'skateboard': {'index': 36, 'id': 41, 'describe': '滑板'},
    'surfboard': {'index': 37, 'id': 42, 'describe': '冲浪板'},
    'tennis racket': {'index': 38, 'id': 43, 'describe': '网球拍'},
    'bottle': {'index': 39, 'id': 44, 'describe': '瓶'},
    'wine glass': {'index': 40, 'id': 46, 'describe': '酒杯'},
    'cup': {'index': 41, 'id': 47, 'describe': '杯'},
    'fork': {'index': 42, 'id': 48, 'describe': '叉子'},
    'knife': {'index': 43, 'id': 49, 'describe': '刀'},
    'spoon': {'index': 44, 'id': 50, 'describe': '汤匙'},
    'bowl': {'index': 45, 'id': 51, 'describe': '碗'},
    'banana': {'index': 46, 'id': 52, 'describe': '香蕉'},
    'apple': {'index': 47, 'id': 53, 'describe': '苹果'},
    'sandwich': {'index': 48, 'id': 54, 'describe': '三明治'},
    'orange': {'index': 49, 'id': 55, 'describe': '橙'},
    'broccoli': {'index': 50, 'id': 56, 'describe': '西兰花'},
    'carrot': {'index': 51, 'id': 57, 'describe': '胡萝卜'},
    'hot dog': {'index': 52, 'id': 58, 'describe': '热狗'},
    'pizza': {'index': 53, 'id': 59, 'describe': '披萨'},
    'donut': {'index': 54, 'id': 60, 'describe': '甜甜圈'},
    'cake': {'index': 55, 'id': 61, 'describe': '蛋糕'},
    'chair': {'index': 56, 'id': 62, 'describe': '椅子'},
    'couch': {'index': 57, 'id': 63, 'describe': '沙发'},
    'potted plant': {'index': 58, 'id': 64, 'describe': '盆栽植物'},
    'bed': {'index': 59, 'id': 65, 'describe': '床'},
    'dining table': {'index': 60, 'id': 67, 'describe': '餐桌'},
    'toilet': {'index': 61, 'id': 70, 'describe': '厕所'},
    'tv': {'index': 62, 'id': 72, 'describe': '电视'},
    'laptop': {'index': 63, 'id': 73, 'describe': '笔记本电脑'},
    'mouse': {'index': 64, 'id': 74, 'describe': '鼠标'},
    'remote': {'index': 65, 'id': 75, 'describe': '遥控器'},
    'keyboard': {'index': 66, 'id': 76, 'describe': '键盘'},
    'cell phone': {'index': 67, 'id': 77, 'describe': '手机'},
    'microwave': {'index': 68, 'id': 78, 'describe': '微波炉'},
    'oven': {'index': 69, 'id': 79, 'describe': '烤箱'},
    'toaster': {'index': 70, 'id': 80, 'describe': '烤面包机'},
    'sink': {'index': 71, 'id': 81, 'describe': '水槽'},
    'refrigerator': {'index': 72, 'id': 82, 'describe': '冰箱'},
    'book': {'index': 73, 'id': 84, 'describe': '书'},
    'clock': {'index': 74, 'id': 85, 'describe': '时钟'},
    'vase': {'index': 75, 'id': 86, 'describe': '花瓶'},
    'scissors': {'index': 76, 'id': 87, 'describe': '剪刀'},
    'teddy bear': {'index': 77, 'id': 88, 'describe': '泰迪熊'},
    'hair drier': {'index': 78, 'id': 89, 'describe': '吹风机'},
    'toothbrush': {'index': 79, 'id': 90, 'describe': '牙刷'},
}

推理结果 result 的长度为 len(result) == 2，其中：

检测框 result[0] 的长度固定为 len(result[0]) == 80，因为 coco 数据集用于检测、分割的部分共有 80 个类别

检测框的解析方式
- 长度为 5 的列表，内容分别代表的是 [x1, y1, x2, y2, 置信度]

检测框的数据示例

[
    array([
        [2.54207230e+02, 1.04094536e+02, 2.62954681e+02, 1.12405502e+02, 7.87650794e-02],
        [3.75357147e+02, 1.20013657e+02, 3.81758453e+02, 1.33114960e+02, 6.91782460e-02],
        [5.32841248e+02, 1.09704865e+02, 5.40237061e+02, 1.24966133e+02, 1.59687027e-02],
        [6.22804871e+02, 1.05815277e+02, 6.35152161e+02, 1.15566162e+02, 5.09093935e-03]
    ], dtype=float32),
    array([], shape=(0, 5), dtype=float32),
    array([
        [4.81168701e+02, 1.10373505e+02, 5.23180786e+02, 1.30380783e+02, 9.89862204e-01],
        [4.32057190e+02, 1.04940285e+02, 4.83949829e+02, 1.32065140e+02, 9.75807786e-01],
        [8.68552148e-01, 1.12007828e+02, 6.18107605e+01, 1.44267593e+02, 9.75750148e-01],
        [6.08653687e+02, 1.11829590e+02, 6.35857971e+02, 1.37511154e+02, 9.74755704e-01],
        [2.66471069e+02, 1.05501320e+02, 3.26720917e+02, 1.27970596e+02, 9.69615817e-01],
        [1.91124908e+02, 1.08998817e+02, 2.99086731e+02, 1.55649002e+02, 9.68829334e-01],
        [3.98783081e+02, 1.11128929e+02, 4.33258514e+02, 1.33113083e+02, 9.66077983e-01],
        ......
        [2.54207230e+02, 1.04094536e+02, 2.62954681e+02, 1.12405502e+02, 5.26814163e-03]
    ], dtype=float32),
    ......
    array([
        [3.7397647e+02, 1.3353706e+02, 4.3298822e+02, 1.8852022e+02, 2.0611383e-01],
        [2.1776900e+02, 1.7274181e+02, 4.5755392e+02, 3.8911337e+02, 6.3707083e-03]
    ], dtype=float32),
    array([], shape=(0, 5), dtype=float32)
]

分割对象 result[1] 的长度固定为 len(result[1]) == 80，同样代表 coco 数据集里的 80 个类别

分割对象的解析方式，是一个 2 维数组
- 第 1 个维度代表图像的高度
- 第 2 个维度代表图像的框度
- 具体的 bool 值用于判断下标 [h][w] 的像素点是否为识别目标的分割像素点，以抠图需求为例
  - True 表示下标 [h][w] 的像素点应该被保留
  - False 表示下标 [h][w] 的像素点应该被去掉

分割对象的数据示例

[
    [
        array([
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            ...,
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False]
        ]),
        array([
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            ...,
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False]
        ]),
        array([[False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            ...,
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False],
            [False, False, False, ..., False, False, False]
        ])
    ],
    ...,
    [],
    []
]

因为推理结果 result 的组织结构是 result [0=检测框,1=分割对象] [coco 80 个类别的顺序下标]，这种组织结构不方便我们提取抠图的数据，所以需要将推理结果 result 拆解后存储到前面定义好 coco_80_dict 字典中。

在 demo.py 文件里继续编写，拆解推理结果 result 数据，以方便我们提取抠图的结构存储到 coco_80_dict 字典：

for cc_key, cc_value in coco_80_dict.items():
    coco_80_dict[cc_key]['number'] = len(result[0][cc_value['index']])
    coco_80_dict[cc_key]['detection'] = result[0][cc_value['index']]
    coco_80_dict[cc_key]['segmentation'] = result[1][cc_value['index']]

此时 coco_80_dict 字典的数据结构如下所示：

{
    'person': {
        'index': 0,
        'id': 1,
        'describe': '人',
        'number': 4,
        'detection': 检测框 list[[x1, y1, x2, y2, 置信度]],
        'segmentation': 分割对象 list[[h][w]=是否为识别目标的分割像素点]
    },
    ......
    'toothbrush': {
        'index': 79,
        'id': 90,
        'describe': '牙刷',
        'number': 0,
        'detection': array([], shape=(0, 5), dtype=float32),
        'segmentation': []
    }
}

接下来以位于图像中心位置的板凳为例，把板凳的检测框和分割对象可视化成图像文件，我们开始解析，首先可视化板凳的检测框。

在 demo.py 文件里继续编写，读取第 1 个板凳的检测框（ coco_80_dict['bench']['detection'][0] ），并将其可视化：

img_detection = deepcopy(img_raw)  # 深拷贝原始 numpy.ndarray 对象
img_detection_data = coco_80_dict['bench']['detection'][0]  # 第1个板凳的检测框
img_detection_data_x1 = int(img_detection_data[0])
img_detection_data_y1 = int(img_detection_data[1])
img_detection_data_x2 = int(img_detection_data[2])
img_detection_data_y2 = int(img_detection_data[3])
print(f'目标置信度 = {img_detection_data[4]*100}%')  # 绘制比较麻烦，就直接打印

for x_i in range(img_detection_data_x1, img_detection_data_x2):
    img_detection[img_detection_data_y1, x_i] = [0,0,255]  # 重置为 [0,0,255] 即红色像素点
    img_detection[img_detection_data_y2, x_i] = [0,0,255]

for y_i in range(img_detection_data_y1, img_detection_data_y2):
    img_detection[y_i, img_detection_data_x1] = [0,0,255]
    img_detection[y_i, img_detection_data_x2] = [0,0,255]

cv2.imwrite(f'demo/demo_result_detection.jpg', img_detection)

执行 demo.py 文件后，控制台会打印 目标置信度 = 99.38850998878479%，然后项目下面会生成 demo/demo_result_detection.jpg 文件：

demo_result_detection_1676874870

在 demo.py 文件里继续编写，读取第 1 个板凳的分割对象（ coco_80_dict['bench']['segmentation'][0] ），并将其可视化：

img_segmentation = deepcopy(img_raw)  # 深拷贝原始 numpy.ndarray 对象
img_segmentation_data = coco_80_dict['bench']['segmentation'][0]  # 第1个板凳的分割对象
for h_i in range(img_h):
    for w_i in range(img_w):
        # 通过分割对象的 bool 值来判断像素点是否需要去除(重置为 [255,255,255] 即纯白背景)
        if not img_segmentation_data[h_i, w_i]:
            img_segmentation[h_i, w_i] = [255,255,255]

cv2.imwrite(f'demo/demo_result_segmentation.jpg', img_segmentation)

执行 demo.py 文件后，项目下面会生成 demo/demo_result_segmentation.jpg 文件：

demo_result_segmentation_1676875708

最后，就是抠图最后一步，把现在的 3 通道图像数据改成 4 通道（添加透明度的通道），输出透明背景的图像文件。

在 demo.py 文件里继续编写，读取第 1 个板凳的分割对象（ coco_80_dict['bench']['segmentation'][0] ），先添加透明度通道再并将其可视化：

img_segmentation_png_raw = deepcopy(img_raw)  # 深拷贝原始numpy.ndarray对象
img_segmentation_png = np.full((img_h, img_w, 4), 0, dtype='uint8')  # 创建(图像高,图像宽,4通道)大小的numpy.ndarray对象
img_segmentation_png_data = coco_80_dict['bench']['segmentation'][0]
for h_i in range(img_h):
    for w_i in range(img_w):
        if img_segmentation_png_data[h_i, w_i]:
            img_segmentation_png[h_i, w_i] = np.append(img_segmentation_png_raw[h_i, w_i], 255)

cv2.imwrite(f'demo/demo_result_segmentation_png.png', img_segmentation_png)