YOLO后处理trick - 减少nms的计算次数、比较次数和空间消耗

news2025/3/31 19:59:37

前言

1.问题分析

问题1：排序问题

问题2：极大值抑制问题

2.优化比较和计算次数

优化1：跳过reshape直接置信度筛选

优化2：减少用于nms的bbox数

3.举个荔枝

总结

前言

减少YOLO后处理nms的计算和比较次数。

YOLO-det输出的维度是(1, 4+cls_num, 8400)，如果直接进行nms比较会进行大量无效的重复操作：排序中的比较、四则运算等。

1.问题分析

nms的python代码:

def nms(boxes, scores, iou_threshold):
    boxes = np.array(boxes)
    scores = np.array(scores)

    # Sort by score
    sorted_indices = np.argsort(scores)[::-1]

    keep_boxes = []
    while sorted_indices.size > 0:
        # Pick the last box
        box_id = sorted_indices[0]
        keep_boxes.append(box_id)

        # Compute IoU of the picked box with the rest
        ious = compute_iou(boxes[box_id, :], boxes[sorted_indices[1:], :])

        # Remove boxes with IoU over the threshold
        keep_indices = np.where(ious < iou_threshold)[0]

        # print(keep_indices.shape, sorted_indices.shape)
        sorted_indices = sorted_indices[keep_indices + 1]

    return keep_boxes

问题1：排序问题

上述代码先对所有检测结果进行排序，然后再计算IoU，lg8400≈13，则实际排序中的比较次数nlgn约为8400×13≈110,000次。

问题2：极大值抑制问题

极大值抑制需要两两比较，时间复杂度为O(n^2)，这个就更高了，不过由于已经使用置信度筛选了，一般也就计算个百来次。

2.优化比较和计算次数

C++的nms代码照着Python实现，不需要改变，修改传入前的bbox：

        int num_channels = 5;
        int num_detections = output_size / num_channels;

        // 定义置信度阈值
        float confidence_threshold = 0.5f; // 根据需要调整

        // 提取满足置信度阈值的检测框和置信度，同时进行xywh比较优化
        std::vector<std::vector<float>> filtered_boxes;
        std::vector<float> confidences;

        for (int i = 0; i < num_detections; ++i) {
            float cx = output_data[i];                        // 中心点x
            float cy = output_data[num_detections + i];       // 中心点y
            float w = output_data[2 * num_detections + i];    // 宽度
            float h = output_data[3 * num_detections + i];    // 高度
            float confidence = output_data[4 * num_detections + i]; // 置信度

            if (confidence > confidence_threshold) {
                if (!filtered_boxes.empty()) {
                    // 获取前一个满足条件的检测框
                    const std::vector<float>& last_box = filtered_boxes.back();

                    // 比较当前检测框与前一个检测框的xywh差值
                    if (std::fabs(cx - last_box[0]) < 1.0f &&
                        std::fabs(cy - last_box[1]) < 1.0f &&
                        std::fabs(w - last_box[2]) < 1.0f &&
                        std::fabs(h - last_box[3]) < 1.0f) {
                        
                        // 如果当前置信度大于上一个，则替换上一个检测框
                        if (confidence > confidences.back()) {
                            filtered_boxes.back() = {cx, cy, w, h};
                            confidences.back() = confidence;
                        }
                        continue; // 当前值被舍弃或替换，跳过后续步骤
                    }
                }
                
                // 如果没有相似的前一个检测框，或当前框不被舍弃，则保存当前检测框
                filtered_boxes.push_back({cx, cy, w, h});
                confidences.push_back(confidence);
            }
        }

        // 将xywh转换为xyxy
        std::vector<std::vector<float>> boxes;
        boxes.reserve(filtered_boxes.size());

        for (const auto& box : filtered_boxes) {
            float cx = box[0];
            float cy = box[1];
            float w = box[2];
            float h = box[3];
            float x1 = cx - w / 2.0f;
            float y1 = cy - h / 2.0f;
            float x2 = cx + w / 2.0f;
            float y2 = cy + h / 2.0f;
            boxes.push_back({x1, y1, x2, y2});
        }

        std::cout << "Boxes size before NMS: " << boxes.size() << std::endl;

优化1：跳过reshape直接置信度筛选

原始Python会把推理输出的维度(1, 4+cls_num, 8400) reshape成(8400, 4+cls_num)，再进行后续操作（所有操作是对原本所有预测结果进行的）。

C++中多维数组本身就是一维数组，所以直接单循环遍历一次即可，在置信度满足时，将值保存到新的变量filtered_boxes中（如果为了省去开辟空间，可以将值保存到原vector的前面，并记录末尾索引）：

// 提取满足置信度阈值的检测框和置信度，同时进行xywh比较优化
        std::vector<std::vector<float>> filtered_boxes;
        std::vector<float> confidences;

        for (int i = 0; i < num_detections; ++i) {
            float cx = output_data[i];                        // 中心点x
            float cy = output_data[num_detections + i];       // 中心点y
            float w = output_data[2 * num_detections + i];    // 宽度
            float h = output_data[3 * num_detections + i];    // 高度
            float confidence = output_data[4 * num_detections + i]; // 置信度

            if (confidence > confidence_threshold) {
                // 置信度时满足操作
                }

优化2：减少用于nms的bbox数

经过优化1后，用于nms的bbox共91个，可以观察到，由于推理是“滑动窗口式”地进行的，会使得“重复”预测的值在“排列”上是相邻的，且重复预测的值在xywh上差值往往在“1.0”以内：

因此，可以进行一次遍历，用当前值和最后一次满足条件的值对比，如果四个差值均在“1.0”以内，则保留较大置信度的值：

if (!filtered_boxes.empty()) {
    // 获取前一个满足条件的检测框
    const std::vector<float>& last_box = filtered_boxes.back();

    // 比较当前检测框与前一个检测框的xywh差值
    if (std::fabs(cx - last_box[0]) < 1.0f &&
        std::fabs(cy - last_box[1]) < 1.0f &&
        std::fabs(w - last_box[2]) < 1.0f &&
        std::fabs(h - last_box[3]) < 1.0f) {
                        
        // 如果当前置信度大于上一个，则替换上一个检测框
        if (confidence > confidences.back()) {
            filtered_boxes.back() = {cx, cy, w, h};
            confidences.back() = confidence;
        }
            continue; // 当前值被舍弃或替换，跳过后续步骤
     }
}

上述处理后，用于nms的bbox由91降至34：

原始IoU比较次数：91×91÷2=4140次；

修改后IoU比较次数：34×34÷2=578次+91次xywh比较。减少了86%的IoU比较次数。（xywh比较仅涉及4次减法，运算量远远小于IoU比较）。

算法有些误差（原始nms计算会给无效的值打上mask，计算量会减少不少），但也能直观反应比较次数的减少。