基于YOLOv5的工地安全帽、口罩检测系统

1，YOLOv5模型原理介绍

1.1 输入侧

1.1.1 数据增强

1.1.2 自适应锚框计算

1.1.3 自适应图片缩放

1.2 Backbone

1.3 Neck

1.4 输出端

2 ，基于YOLOv5的工地安全帽、口罩检测系统实现流程

2.1 整体项目

2.2 代码展示

2.3 效果展示

1，YOLOv5模型原理介绍

YOLOv5官方代码中，给出的目标检测网络中一共有4个版本，分别是YOLOv5s、YOLOv5m、YOLOv5l、YOLOv5x四个模型。

YOLOv5s整体的网络结构图

1.1 输入侧

YOLO v5使用Mosaic数据增强操作提升模型的训练速度和网络的精度；并提出了一种自适应锚框计算与自适应图片缩放方法

1.1.1 数据增强

YOLOv5中使用的Mosaic是参考2019年底提出的CutMix数据增强的方式，但CutMix只使用了两张图片进行拼接，而Mosaic数据增强则采用了4张图片，随机缩放、随机裁剪、随机排布的方式进行拼接，对于小目标的检测效果是不错的。

1.1.2 自适应锚框计算

在YOLO算法中，针对不同的数据集，都会有初始设定长宽的锚框。在网络训练中，网络在初始锚框的基础上输出预测框，进而和真实框groundtruth进行比对，计算两者差距，再反向更新，迭代网络参数，因此初始锚框是比较重要的一部分。

在YOLOv3、YOLOv4中，训练不同的数据集时，计算初始锚框的值是通过单独的程序运行的。但YOLOv5中将此功能嵌入到代码中，每次训练将会自适应的计算不同训练集中的最佳锚框值。如果觉得计算的锚框效果不好，可以将自动计算锚框功能关闭。具体操作为train.py中下面一行代码，设置成False。

1.1.3 自适应图片缩放

1.2 Backbone

1.3 Neck

可以看到经过几次下采样，三个紫色箭头指向的地方，输出分别是76*76、38*38、19*19。

以及最后的Prediction中用于预测的三个特征图①19*19*255、②38*38*255、③76*76*255。[注：255表示80类别(1+4+80)×3=255]

我们将Neck部分用立体图画出来，更直观的看下两部分之间是如何通过FPN结构融合的。

1.4 输出端

1）Bounding box损失函数

目标检测任务的损失函数一般由Classificition Loss（分类损失函数）和Bounding Box Regeression Loss（回归损失函数）两部分构成。Bounding Box Regeression的Loss近些年的发展过程是：Smooth L1 Loss-> IoU Loss（2016）-> GIoU Loss（2019）-> DIoU Loss（2020）->CIoU Loss（2020）

a.IOU_Loss

问题1：即状态1的情况，当预测框和目标框不相交时，IOU=0，无法反应两个框距离的远近，此时损失函数不可导，IOU_Loss无法优化两个框不相交的情况。

问题2：即状态2和状态3的情况，当两个预测框大小相同，两个IOU也相同，IOU_Loss无法区分两者相交情况的不同。

b.GIOU_Loss

问题：状态1、2、3都是预测框在目标框内部且预测框大小一致的情况，这时预测框和目标框的差集都是相同的，因此这三种状态的GIOU值也都是相同的，这时GIOU退化成了IOU，无法区分相对位置关系。

c.DIOU_Loss

好的目标框回归函数应该考虑三个重要几何因素：重叠面积、中心点距离，长宽比。

针对IOU和GIOU存在的问题，作者从两个方面进行考虑

如何最小化预测框和目标框之间的归一化距离？如何在预测框和目标框重叠时，回归的更准确？

针对第一个问题，提出了DIOU_Loss（Distance_IOU_Loss）

比如上面三种情况，目标框包裹预测框，本来DIOU_Loss可以起作用。但预测框的中心点的位置都是一样的，因此按照DIOU_Loss的计算公式，三者的值都是相同的。针对这个问题，又提出了CIOU_Loss。

d.CIOU_Loss

CIOU_Loss和DIOU_Loss前面的公式都是一样的，不过在此基础上还增加了一个影响因子，将预测框和目标框的长宽比都考虑了进去。其中v是衡量长宽比一致性的参数。

这样CIOU_Loss就将目标框回归函数应该考虑三个重要几何因素：重叠面积、中心点距离，长宽比全都考虑进去了。

IOU_Loss：主要考虑检测框和目标框重叠面积。

GIOU_Loss：在IOU的基础上，解决边界框不重合时的问题。

DIOU_Loss：在IOU和GIOU的基础上，考虑边界框中心点距离的信息。

CIOU_Loss：在DIOU的基础上，考虑边界框宽高比的尺度信息。

Yolov5中采用其中的CIOU_Loss做Bounding box的损失函数。

（2）nms非极大值抑制

在目标检测的后处理过程中，针对很多目标框的筛选，通常需要nms操作。因为CIOU_Loss中包含影响因子v，涉及groudtruth的信息，而测试推理时，是没有groundtruth的。所以Yolov4在DIOU_Loss的基础上采用DIOU_nms的方式，而Yolov5中采用加权nms的方式。可以看出，采用DIOU_nms，下方中间箭头的黄色部分，原本被遮挡的摩托车也可以检出。

项目中采用DIOU_nms的方式，在同样的参数情况下，将nms中IOU修改成DIOU_nms。对于一些遮挡重叠的目标，确实会有一些改进。

3、小目标分割检测
目标检测发展很快，但对于小目标的检测还是有一定的瓶颈，特别是大分辨率图像小目标检测。比如7920*2160，甚至16000*16000的图像。图像的分辨率很大，但又有很多小的目标需要检测。但是如果直接输入检测网络，比如yolov3，检出效果并不好。

主要原因是：

（1）小目标尺寸

以网络的输入608*608为例，yolov3、yolov4，yolov5中下采样都使用了5次，因此最后的特征图大小是19*19，38*38，76*76。

三个特征图中，最大的76*76负责检测小目标，而对应到608*608上，每格特征图的感受野是608/76=8*8大小。

再将608*608对应到7680*2160上，以最长边7680为例，7680/608*8=101。即如果原始图像中目标的宽或高小于101像素，网络很难学习到目标的特征信息。（PS：这里忽略多尺度训练的因素及增加网络检测分支的情况）

（2）高分辨率

而在很多遥感图像中，长宽比的分辨率比7680*2160更大，比如上面的16000*16000，如果采用直接输入原图的方式，很多小目标都无法检测出。

（3）显卡爆炸

很多图像分辨率很大，如果简单的进行下采样，下采样的倍数太大，容易丢失数据信息。但是倍数太小，网络前向传播需要在内存中保存大量的特征图，极大耗尽GPU资源,很容易发生显存爆炸，无法正常的训练及推理。因此可以借鉴2018年YOLT算法的方式，改变一下思维，对大分辨率图片先进行分割，变成一张张小图，再进行检测。

需要注意的是：为了避免两张小图之间，一些目标正好被分割截断，所以两个小图之间设置overlap重叠区域，比如分割的小图是960*960像素大小，则overlap可以设置为960*20%=192像素。

每个小图检测完成后，再将所有的框放到大图上，对大图整体做一次nms操作，将重叠区域的很多重复框去除。这样操作，可以将很多小目标检出，比如16000*16000像素的遥感图像。

优点：

（1）准确性

分割后的小图，再输入目标检测网络中，对于最小目标像素的下限会大大降低。

（2）检测方式

在大分辨率图像，比如遥感图像，或者无人机图像，如果无需考虑实时性的检测，且对小目标检测也有需求的项目，可以尝试此种方式。

缺点：

（1）增加计算量

比如原本7680*2160的图像，如果使用直接大图检测的方式，一次即可检测完。但采用分割的方式，切分成N张608*608大小的图像，再进行N次检测，会大大增加检测时间。

2 ，基于YOLOv5的工地安全帽、口罩检测系统实现流程

2.1 整体项目

其中mian_win 是gui界面由pyqt绘制，然后导出为.py文件
detect.py 是系统自带的测试文件，可以用来测试是否正确安装环境
train.py 用来训练自己的数据
test.py用来测试
models 存放的是各种.yaml文件
run存放的是训练和测试的结果
mian.py是检测系统的主函数

2.2 代码展示

from PyQt5.QtWidgets import QApplication, QMainWindow, QFileDialog, QMenu, QAction
from main_win.win import Ui_mainWindow
from PyQt5.QtCore import Qt, QPoint, QTimer, QThread, pyqtSignal
from PyQt5.QtGui import QImage, QPixmap, QPainter, QIcon

import sys
import os
import json
import numpy as np
import torch
import torch.backends.cudnn as cudnn
import os
import time
import cv2

from models.experimental import attempt_load
from utils.datasets import LoadImages, LoadWebcam
from utils.CustomMessageBox import MessageBox
# LoadWebcam 的最后一个返回值改为 self.cap
from utils.general import check_img_size, check_requirements, check_imshow, colorstr, non_max_suppression, \
    apply_classifier, scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path, save_one_box
from utils.plots import colors, plot_one_box, plot_one_box_PIL
from utils.torch_utils import select_device, load_classifier, time_sync
from utils.capnums import Camera
from dialog.rtsp_win import Window


class DetThread(QThread):
    send_img = pyqtSignal(np.ndarray)
    send_raw = pyqtSignal(np.ndarray)
    send_statistic = pyqtSignal(dict)
    # 发送信号：正在检测/暂停/停止/检测结束/错误报告
    send_msg = pyqtSignal(str)
    send_percent = pyqtSignal(int)
    send_fps = pyqtSignal(str)

    def __init__(self):
        super(DetThread, self).__init__()
        self.weights = './yolov5s.pt'           # 设置权重
        self.current_weight = './yolov5s.pt'    # 当前权重
        self.source = '0'                       # 视频源
        self.conf_thres = 0.25                  # 置信度
        self.iou_thres = 0.45                   # iou
        self.jump_out = False                   # 跳出循环
        self.is_continue = True                 # 继续/暂停
        self.percent_length = 1000              # 进度条
        self.rate_check = True                  # 是否启用延时
        self.rate = 100                         # 延时HZ
        self.save_fold = './result'             # 保存文件夹

    @torch.no_grad()
    def run(self,
            imgsz=640,  # inference size (pixels)
            max_det=1000,  # maximum detections per image
            device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
            view_img=True,  # show results
            save_txt=False,  # save results to *.txt
            save_conf=False,  # save confidences in --save-txt labels
            save_crop=False,  # save cropped prediction boxes
            nosave=False,  # do not save images/videos
            classes=None,  # filter by class: --class 0, or --class 0 2 3
            agnostic_nms=False,  # class-agnostic NMS
            augment=False,  # augmented inference
            visualize=False,  # visualize features
            update=False,  # update all models
            project='runs/detect',  # save results to project/name
            name='exp',  # save results to project/name
            exist_ok=False,  # existing project/name ok, do not increment
            line_thickness=3,  # bounding box thickness (pixels)
            hide_labels=False,  # hide labels
            hide_conf=False,  # hide confidences
            half=False,  # use FP16 half-precision inference
            ):

        # Initialize
        try:
            device = select_device(device)
            half &= device.type != 'cpu'  # half precision only supported on CUDA

            # Load model
            model = attempt_load(self.weights, map_location=device)  # load FP32 model
            num_params = 0
            for param in model.parameters():
                num_params += param.numel()
            stride = int(model.stride.max())  # model stride
            imgsz = check_img_size(imgsz, s=stride)  # check image size
            names = model.module.names if hasattr(model, 'module') else model.names  # get class names
            if half:
                model.half()  # to FP16

            # Dataloader
            if self.source.isnumeric() or self.source.lower().startswith(('rtsp://', 'rtmp://', 'http://', 'https://')):
                view_img = check_imshow()
                cudnn.benchmark = True  # set True to speed up constant image size inference
                dataset = LoadWebcam(self.source, img_size=imgsz, stride=stride)
                # bs = len(dataset)  # batch_size
            else:
                dataset = LoadImages(self.source, img_size=imgsz, stride=stride)

            # Run inference
            if device.type != 'cpu':
                model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
            count = 0
            # 跳帧检测
            jump_count = 0
            start_time = time.time()
            dataset = iter(dataset)

            while True:
                # 手动停止
                if self.jump_out:
                    self.vid_cap.release()
                    self.send_percent.emit(0)
                    self.send_msg.emit('停止')
                    if hasattr(self, 'out'):
                        self.out.release()
                    break
                # 临时更换模型
                if self.current_weight != self.weights:
                    # Load model
                    model = attempt_load(self.weights, map_location=device)  # load FP32 model
                    num_params = 0
                    for param in model.parameters():
                        num_params += param.numel()
                    stride = int(model.stride.max())  # model stride
                    imgsz = check_img_size(imgsz, s=stride)  # check image size
                    names = model.module.names if hasattr(model, 'module') else model.names  # get class names
                    if half:
                        model.half()  # to FP16
                    # Run inference
                    if device.type != 'cpu':
                        model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
                    self.current_weight = self.weights
                # 暂停开关
                if self.is_continue:
                    path, img, im0s, self.vid_cap = next(dataset)
                    # jump_count += 1
                    # if jump_count % 5 != 0:
                    #     continue
                    count += 1
                    # 每三十帧刷新一次输出帧率
                    if count % 30 == 0 and count >= 30:
                        fps = int(30/(time.time()-start_time))
                        self.send_fps.emit('fps：'+str(fps))
                        start_time = time.time()
                    if self.vid_cap:
                        percent = int(count/self.vid_cap.get(cv2.CAP_PROP_FRAME_COUNT)*self.percent_length)
                        self.send_percent.emit(percent)
                    else:
                        percent = self.percent_length

                    statistic_dic = {name: 0 for name in names}
                    img = torch.from_numpy(img).to(device)
                    img = img.half() if half else img.float()  # uint8 to fp16/32
                    img /= 255.0  # 0 - 255 to 0.0 - 1.0
                    if img.ndimension() == 3:
                        img = img.unsqueeze(0)

                    pred = model(img, augment=augment)[0]

                    # Apply NMS
                    pred = non_max_suppression(pred, self.conf_thres, self.iou_thres, classes, agnostic_nms, max_det=max_det)
                    # Process detections
                    for i, det in enumerate(pred):  # detections per image
                        im0 = im0s.copy()

                        if len(det):
                            # Rescale boxes from img_size to im0 size
                            det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                            # Write results
                            for *xyxy, conf, cls in reversed(det):
                                c = int(cls)  # integer class
                                statistic_dic[names[c]] += 1
                                label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                                # im0 = plot_one_box_PIL(xyxy, im0, label=label, color=colors(c, True), line_thickness=line_thickness)  # 中文标签画框，但是耗时会增加
                                plot_one_box(xyxy, im0, label=label, color=colors(c, True),
                                             line_thickness=line_thickness)

                    # 控制视频发送频率
                    if self.rate_check:
                        time.sleep(1/self.rate)
                    self.send_img.emit(im0)
                    self.send_raw.emit(im0s if isinstance(im0s, np.ndarray) else im0s[0])
                    self.send_statistic.emit(statistic_dic)
                    # 如果自动录制
                    if self.save_fold:
                        os.makedirs(self.save_fold, exist_ok=True)  # 路径不存