基于图片内容的处理任务

主要包括目标检测、图片分割两大任务。
目标检测：精度相对较高，主要是以检测框的方式，找出图片中目标物体所在坐标。模型运算量相对较小，相对较快。
图片分割：精度相对较低，主要是以像素点的集合方式，找出图片中目标物体边缘的具体像素点。模型运算量相对较大，相对较慢。

目标检测

单阶段：也叫Region-free方法，直接从模型获得预测结果，有YOLO、SSD、RetinaNet等。
两阶段：先检测包含实物的区域，再对该区域内的实物进行分类识别。有R-CNN、Faster R-CNN、Mask R-CNN等。
两阶段检测模型在检测框方面表现的精度更高，单阶段在分类方面表现出的精度更高。

图片分割

语义分割：能将图片中具有不同语义的部分分开
实例分割：能描述出目标物体的轮廓（比检测框更为精细），比语义分割还能识别出单个的具体个体。

非极大值抑制算法（Non-Max Suppression）

目标检测中，会检测出很多个结果，可能会出现重复物体（中心和大小略有不同），要用NMS对检测结果进行去重。具体过程如下：
从所有的检测框中找到置信度较大（大于某个阈值）的检测框
逐一计算其与剩余检测框的区域面积重叠率（IOU）
如果IOU大于一定阈值则剔除
重复上述过程
IOU（Intersection Over Union）交并比

Mask R-CNN 模型

属于两阶段模型，具体步骤如下：
用NMS将一张图分成多个子框，称作锚点，不同尺寸存在重叠。
在图片中为具体实物标注坐标。
根据坐标和IOU计算那些锚点是前景（IOU高的），哪些是背景（IOU低的）。
计算前景的锚点坐标和实物标注的坐标，计算二者的相对位移和长宽的缩放比例。
最终检测区域会转换为一堆锚点的分类（前景和背景）和回归任务（偏移和缩放），每张图片会将其自身标注的信息转化为锚点对应的标签，让模型对已有的锚点进行训练或识别。
在模型中实现区域检测功能的网络被称作区域生成网络（Region Proposal Network）,实际处理中会从RPN的输出结果中选取前景概率较高的一定数量的锚点作为感兴趣区域（Region Of Interest），送到第二阶段的网络中进行计算。

完整步骤：
提取主特征：又称作骨干网络，从图片中提取出一些不同尺寸的特征，通常用一些预训练好的模型（VGG、Inception、ResNet等），这些获得的特征数据被称作特征图。
特征融合：用特征融合金字塔（Feature Pyramin Network）整合骨干网络中的不同尺寸，最终的特征信息用于后面的RPN和最终的分类器网络的计算。
提取ROI：主要通过RPN来实现，在众多锚点计算前景背景的预测值，基于锚点的便宜，然后对前景概率较大的ROI用NMS去重，最终结果取出指定个数的ROI用于后续的计算。
ROI池化：用区域对齐（ROI Align）的方式实现，将特征融合的结果当做图片，按照ROI中的区域框位置从图中取出对应内容，将形状统一成指定大小，用于后面的计算。
最终检测：对上一步的结果一次进行分类，设置矩形坐标、实物像素分割处理。得到最终结果。

实例：使用Mask R-CNN模型进行目标检测与语义分割

cv2.error: OpenCV(4.8.0) 👎 error: (-5:Bad argument)

in function ‘putText’
Overload resolution failed:
Can’t parse ‘org’. Sequence item with index 0 has a wrong type

in function ‘rectangle’
Overload resolution failed:
Can’t parse ‘pt1’. Sequence item with index 0 has a wrong type
argument for rectangle() given by name (‘color’) and position (3)
上述两种报错是因为这两个函数坐标只能是int类型
cv2.rectangle()
cv2.putText()

可以用这种方式对元组里的数据强制类型转换

# initializing list
test_list = [(4, 5), (6, 7), (1, 4), (8, 10)]
  
# printing original list
print("The original list is : " + str(test_list))
  
# Change Datatype of Tuple Values
# Using enumerate() + loop
# converting to string using str()
for idx, (x, y) in enumerate(test_list):
    test_list[idx] = (x, str(y))
  
# printing result 
print("The converted records : " + str(test_list))

完整代码：

from PIL import Image
import matplotlib.pyplot as plt
import torchvision.transforms as T
import torchvision
import numpy as np
import cv2
import random
import torch

#加载模型
model = torchvision.models.detection.maskrcnn_resnet50_fpn()
model.load_state_dict(torch.load(r"pytorch\2-chapter1\some3\maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth")) 	#true 代表下载
model = model.eval()
model.eval()

COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush' ]

len(COCO_INSTANCE_CATEGORY_NAMES) #91

def get_prediction(img_path, threshold):
  img = Image.open(img_path)
  transform = T.Compose([T.ToTensor()])
  img = transform(img)
  pred = model([img])
  print('pred')
  print(pred)
  pred_score = list(pred[0]['scores'].detach().numpy())
  pred_t = [pred_score.index(x) for x in pred_score if x>threshold][-1]
  print("masks>0.5")
  print(pred[0]['masks']>0.5)
  masks = (pred[0]['masks']>0.5).squeeze().detach().cpu().numpy()
  print("this is masks")
  print(masks)
  pred_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(pred[0]['labels'].numpy())]
  pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(pred[0]['boxes'].detach().numpy())]
  masks = masks[:pred_t+1]
  pred_boxes = pred_boxes[:pred_t+1]
  pred_class = pred_class[:pred_t+1]
  return masks, pred_boxes, pred_class


def random_colour_masks(image):
  colours = [[0, 255, 0],[0, 0, 255],[255, 0, 0],[0, 255, 255],[255, 255, 0],[255, 0, 255],[80, 70, 180],[250, 80, 190],[245, 145, 50],[70, 150, 250],[50, 190, 190]]
  r = np.zeros_like(image).astype(np.uint8)
  g = np.zeros_like(image).astype(np.uint8)
  b = np.zeros_like(image).astype(np.uint8)
  randcol = colours[random.randrange(0,10)]
  r[image == 1] = randcol[0]
  g[image == 1] = randcol[1]
  b[image == 1] = randcol[2]
  coloured_mask = np.stack([r, g, b], axis=2)
  return coloured_mask,randcol

def instance_segmentation_api(img_path, threshold=0.5, rect_th=3, text_size=5, text_th=5):
  masks, boxes, pred_cls = get_prediction(img_path, threshold) #调用模型
  img = cv2.imread(img_path)
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  for i in range(len(masks)):
    rgb_mask,randcol = random_colour_masks(masks[i])   #为掩码区填充随机值
    img = cv2.addWeighted(img, 1, rgb_mask, 0.5, 0)

    test_list = [boxes[i][0], boxes[i][1]]

    for idx, (x, y) in enumerate(test_list):
        test_list[idx] = (int(x), int(y))
    # print(test_list)

    cv2.rectangle(img, test_list[0], test_list[1],color=randcol, thickness=rect_th)

    test_list = [boxes[i][0]]

    for idx, (x, y) in enumerate(test_list):
        test_list[idx] = (int(x), int(y))
    print(test_list[0])

    # print(img)
    cv2.putText(img,pred_cls[i], test_list[0], cv2.FONT_HERSHEY_SIMPLEX, text_size, randcol,thickness=text_th)
    plt.figure(figsize=(20,30))
  plt.imshow(img)
  plt.xticks([])
  plt.yticks([])
  plt.show()

#显示模型结果
instance_segmentation_api(r'E:\desktop\Home_Code\pytorch\2-chapter1\some1\horse.jpg')