终于弄懂了非极大抑制 NMS

news2025/7/6 18:02:13

NMS的作用就是有效地剔除目标检测结果中多余的检测框，保留最合适的检测框。

以YOLOv5为例，yolov5模型的输入三个feature map的集合，加上batch的维度，也就是三维张量，即 $[ba t c h ， (p 0 * p 0 + p 1 * p 1 + p 2 * p 2) * 3 ， 4 + co n f + c l s n u m]$ ,模型输出的为相对于调整图片的 $x y w h$ ，然后后面就要进入后处理阶段。

具体来看，模型输入为 $640 * 640$ 时，推理输出结果在 $20 * 20, 40 * 40 ， 80 * 80$ 三个尺度上的预测框总和为 $20 * 20 * 3 + 40 * 40 * 3 + 80 * 80 * 3 = 25200$ ，推理时一般一次输入一张图片，batch=1，coco数据集80类，最终的输出张量为 $[1, 25200, 85]$

非极大值抑制一般分为置信度抑制和IOU抑制

置信度抑制，即根据设定阈值，从检测结果结果中剔除置信度小于阈值的检测框，保留置信度较高的检测框，这一步非常交单。
IOU（交并比）抑制较为复杂，如果目标检测包含多个分类，则需要对每个分类的检测框单独进行IOU抑制，以一个置信度较高的检测框为基准，与另一个同类检测框计算IOU值，如果IOU值大于设定阈值，则认为另一个检测框与基准检测框为同一目标，需要剔除该检测框。

交并比 (IOU) 如何计算？

交并比（IOU）是度量两个检测框（对于目标检测来说）的交叠程度，公式如下：
$\frac{A\cap B}{A \cup B}$

假设两个矩形框A和B的位置分别为：

$A: [x_{a1}, y_{a1}, x_{a2}, y_{a2}]$

$B: [x_{b1}, y_{b1}, x_{b2}, y_{b2}]$

假如位置关系如图3 所示：

如果二者有相交部分，则相交部分左上角坐标为：
$x_1 = max(x_{a1}, x_{b1}), \ \ \ \ \ y_1 = max(y_{a1}, y_{b1})$

相交部分右下角坐标为：
$x_2 = min(x_{a2}, x_{b2}), \ \ \ \ \ y_2 = min(y_{a2}, y_{b2})$

计算先交部分面积：
$max(x_2 - x_1 + 1.0, 0) \cdot max(y_2 - y_1 + 1.0, 0)$

矩形框A和B的面积分别是：
$S_A = (x_{a2} - x_{a1} + 1.0) \cdot (y_{a2} - y_{a1} + 1.0)$

计算相并部分面积：
$union = S_A + S_B - intersection$

计算交并比：

$\frac{intersection}{union}$

交并比计算程序如下：

# 计算IoU，矩形框的坐标形式为xyxy，这个函数会被保存在box_utils.py文件中
def box_iou_xyxy(box1, box2):
    # 获取box1左上角和右下角的坐标
    x1min, y1min, x1max, y1max = box1[0], box1[1], box1[2], box1[3]
    # 计算box1的面积
    s1 = (y1max - y1min + 1.) * (x1max - x1min + 1.)
    # 获取box2左上角和右下角的坐标
    x2min, y2min, x2max, y2max = box2[0], box2[1], box2[2], box2[3]
    # 计算box2的面积
    s2 = (y2max - y2min + 1.) * (x2max - x2min + 1.)
    
    # 计算相交矩形框的坐标
    xmin = np.maximum(x1min, x2min)
    ymin = np.maximum(y1min, y2min)
    xmax = np.minimum(x1max, x2max)
    ymax = np.minimum(y1max, y2max)
    # 计算相交矩形行的高度、宽度、面积
    inter_h = np.maximum(ymax - ymin + 1., 0.)
    inter_w = np.maximum(xmax - xmin + 1., 0.)
    intersection = inter_h * inter_w
    # 计算相并面积
    union = s1 + s2 - intersection
    # 计算交并比
    iou = intersection / union
    return iou

bbox1 = [100., 100., 200., 200.]
bbox2 = [120., 120., 220., 220.]
iou = box_iou_xyxy(bbox1, bbox2)
print('IoU is {}'.format(iou))

IOU即为计算两个相同类别检测框的交并比。交并比即两个检测框相交区域与联合区域比值。假设两个检测框完全不相交，那么交集为0，IOU也就为0，那么可以认为两个检测框所预测为不同目标，需要保留。如果两个检测框完全重合，IOU值为1，两个检测框预测为同一个目标，那么就要剔除一个检测框。

IOU抑制代码实现

import numpy as np
 
# pred为[1,25200,85] ,conf_thres为置信度的阈值，iou_thres为iou阈值
def nms(pred, conf_thres, iou_thres):
    # 置信度抑制，小于置信度阈值的则删除
    conf = pred[..., 4] > conf_thres
    box = pred[conf == True]
    # 所有层的第5个到最后一个值，即获取所有类别的得分
    cls_conf = box[..., 5:]
    cls = []
    for i in range(len(cls_conf)):
        # 返回cls_conf[i]中最大值的索引，根据索引定位类别
        cls.append(int(np.argmax(cls_conf[i])))
    # 获取类别
    total_cls = list(set(cls))  #删除重复项，获取出现的类别标签列表,example=[0, 17]
    output_box = []   #最终输出的预测框 
    # 不同分类候选框置信度 
    for i in range(len(total_cls)):
        clss = total_cls[i]   # 当前类别标签
        # 从所有候选框中取出当前类别对应的所有候选框
        cls_box = []
        for j in range(len(cls)):
            if cls[j] == clss:
                box[j][5] = clss
                cls_box.append(box[j][:6])
        cls_box = np.array(cls_box)
        box_conf = cls_box[..., 4]   #取出候选框置信度
        box_conf_sort = np.argsort(box_conf)   #获取排序后索引
        max_conf_box = cls_box[box_conf_sort[len(box_conf) - 1]]
        output_box.append(max_conf_box)   # 将置信度最高的候选框输出为第一个预测框
        cls_box = np.delete(cls_box, 0, 0)  # 删除置信度最高的候选框
        while len(cls_box) > 0:
            max_conf_box = output_box[len(output_box) - 1]  # 将输出预测框列表最后一个作为当前最大置信度候选框
            del_index = []
            for j in range(len(cls_box)):
                current_box = cls_box[j] # 当前预测框
                interArea = getInter(max_conf_box, current_box)    
                iou = getIou(max_conf_box, current_box, interArea)  # 计算交并比
                if iou > iou_thres:
                    del_index.append(j)   # 根据交并比确定需要移出的索引
            cls_box = np.delete(cls_box, del_index, 0)   # 删除此轮需要移出的候选框
            if len(cls_box) > 0:
                output_box.append(cls_box[0])
                cls_box = np.delete(cls_box, 0, 0)
    return output_box