YOLO-V5 系列算法和代码解析（四）—

文章目录

- 辅助工具
- 网络配置文件
- 网络构建
- 网络推理
- 绘制网络结构

辅助工具

借助辅助工具可视化网络结构，达到辅助阅读代码，进而辅助手动绘制结构清晰的网络结构，最终理解整个网络架构的目的，为深入学习【yolo-v5】提供有效的保障。

tensorboard
根据训练的日志文件（在exp目录下），可视化命令，tensorboard --logdir="日志路径"，参考下图，

然后在浏览器打开链接【http://localhost:6006/ 】即可，如下图所示，可以清晰看到网络整体结构，节点之间的连接，每一个模块的结构，名字，输入、输出的shape，卷积核的相关信息。但是，该图的连接稍微有点混乱，不便于整体看网络的结构和深刻记忆。后续章节内容，我们会绘制更为清晰的网络结构图。
Netron
Netron 也是网络结构可视化的常用工具，官方的【pt】模型可视化效果不好，无法显示完整的网络结构。为此，需要转换为【onnx】格式，显示效果如下图所示（实际阅读源码和绘制过程中，并没有参考该图）。但也是一个常用的可视化工具，可视化结果如下图所示，
源码
既然是学习网络结构，源码阅读肯定是核心内容，核心源码的脚本文件为【yolo.py，common.py】。阅读源码应该达到如下的【目标】：（1）深入理解整体的执行逻辑；（2）明确每一层的输入和输出shape，卷积核（size，stride，pad）等信息；（3）每一个网络模块的实现方法。（4）能够自己绘制出网络的详细结构。

网络配置文件

配置文件
YOLO-V5的网络结构配置写在格式为【.yaml】的文件中，~/yolov5/models/yolov5s.yaml。该文件会定义模型的每一层的相关参数，网络模块的结构信息，以及Anchors等一切网络相关的数值。如下图所示，
配置文件解析
以【yolov5s.yaml】为例，文件的第一部分内容如下，其中， depth_multiple（网络层数缩放系数），width_multiple（通道数缩放系数），这两个参数表示深度和宽度缩放的系数。我们知道 YOLO-V5 包含【n，s，m，l，x】等不同大小的模型族，【yolov5l】的系数为【1】，其它模型都是相对该模型进行缩放深度和宽度。深度：也即是网络模块的层数。宽度：每一层的输出通道数量。后面，我会继续讨论宽度和深度参数是如何作用在网络构建的代码中的。
Anchors：三组 anchors，分别对应不同尺度的目标。每组三个anchors，表示每一个grid会预测三个框。这里的值是相对于大图，实际训练过程中，还需将其转换为特征图尺度下的值（P3/8，P4/16，P5/32）。
```
# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32
```
配置文件的第二部分内容如下，定义了相关的模块【backbone，head】。需要注意的是，该部分定义中，【n，s，m，l，x】的定义是一样的，唯一不同的是上述的 depth_multiple（层数），width_multiple（通道数）参数。在网络构建过程中，只需乘以相应的系数即可。具体内容如下，
```
# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2，
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
```
上述文件内容中，参数解释如下，
【-1】：通常用于索引上一层，得到输出的通道数；
【1】：表示当前模块的数量；
【‘Conv’】：当前模块的名字，
【64, 6, 2, 2】：分别表示输出的通道数量，卷积核大小，stride=2，padding=2，如果只有三个数字，padding=0（默认）；
【-1, 1, SPPF, [1024, 5]】：SPPF模块，1024：输出通道数量，5：卷积核，以及padding=5//2；
【[-1, 6], 1, Concat, [1]】：【-1,6】：concat连接的层的ID，也即是上一层和第6层。

下图中更加简洁的介绍了每一个参数的含义，

网络构建

网络初始化，有两种方式获得网络的配置文件：（1）从预训练模型【yolov5s.pt】中获取；（2）从【yolov5s.yaml】文件中获取。

L117：使用预训练模型（weight）可以获得网络结构信息。
L127：传入cfg文件（“yolov5s.yaml”）。具体的传入参数在train.py（L114~L128），代码如下图所示，
L114~L128：完整代码如下图所示，

在这里插入图片描述

下面，我们将分段解析模型定义类（class Model(nn.Module)）的内容，

初始化的代码片段，L5：从预训练模型中获得网络的各种配置参数，L6~L10：解析网络的配置文件，具体如下

class Model(nn.Module):
    # YOLOv5 model
    def __init__(self, cfg='yolov5s.yaml', ch=3, nc=None, anchors=None):  # model, input channels, number of classes
        super().__init__()
        if isinstance(cfg, dict):
            self.yaml = cfg  # model dict
        else:  # is *.yaml
            import yaml  # for torch hub
            self.yaml_file = Path(cfg).name
            with open(cfg, encoding='ascii', errors='ignore') as f:
                self.yaml = yaml.safe_load(f)  # model dict

定义模型
读取完配置文件后，读取配置文件的内容进行网络构建。核心函数：parse_model()

     # Define model
     ch = self.yaml['ch'] = self.yaml.get('ch', ch)  # input channels
     if nc and nc != self.yaml['nc']:
         LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}")
         self.yaml['nc'] = nc  # override yaml value
     if anchors:
         LOGGER.info(f'Overriding model.yaml anchors with anchors={anchors}')
         self.yaml['anchors'] = round(anchors)  # override yaml value
     self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch])  # model, savelist
     self.names = [str(i) for i in range(self.yaml['nc'])]  # default names
     self.inplace = self.yaml.get('inplace', True)

初始化参数
主要是处理anchor，以及初始化网络权重和偏置（weight，bias），这里需要注意函数，self._initialize_biases()

     # Build strides, anchors
     m = self.model[-1]  # Detect()
     if isinstance(m, Detect):
         s = 256  # 2x min stride
         m.inplace = self.inplace
         m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))])  # forward
         check_anchor_order(m)  # must be in pixel-space (not grid-space)
         m.anchors /= m.stride.view(-1, 1, 1)
         self.stride = m.stride
         self._initialize_biases()  # only run once

     # Init weights, biases
     initialize_weights(self)
     self.info()
     LOGGER.info('')

网络推理

为了得到每一层的输入和输出，卷积核的信息，以及运行逻辑，需要网络进行推理运行。【train.py】的调用入口为主程序的（L351~L353），如下图所示，

在这里插入图片描述
下面，我们介绍网络设计的核心模块，

推理的核心模块
如下代码片段，循环获得每一个网络模块（m），然后进行推理，

 def _forward_once(self, x, profile=False, visualize=False):
     y, dt = [], []  # outputs
     for m in self.model:
         if m.f != -1:  # if not from previous layer
             x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
         if profile:
             self._profile_one_layer(m, x, dt)
         x = m(x)  # run
         y.append(x if m.i in self.save else None)  # save output
         if visualize:
             feature_visualization(x, m.type, m.i, save_dir=visualize)
     return x

卷积模块
该模块也是构建网络的基础模块，包含【conv，bn，silu】三部分。具体实现如下，

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        return self.act(self.conv(x))

backbone 中的C3模块
C3模块包含了 bottleneck 子模块，

class C3(nn.Module):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))

    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))

可视化结构如下，

在这里插入图片描述

SPPF 模块

class SPPF(nn.Module):
    # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
    def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
        super().__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * 4, c2, 1, 1)
        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)

        def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
             warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
             y1 = self.m(x)
             y2 = self.m(y1)
             return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))

可视化结构如下，
在这里插入图片描述

Upsample
为了融合不同尺度下的特征，特征图的shape以固定的上采样因子变化，通道数不变。使用Pytorch的内置函数进行上采样，Upsample(scale_factor=2.0, mode=nearest)

Concat
在通道的维度进行 concat，特征图的shape不变。代码如下，

class Concat(nn.Module):
    # Concatenate a list of tensors along dimension
    def __init__(self, dimension=1):
        super().__init__()
        self.d = dimension

 def forward(self, x):
     return torch.cat(x, self.d)

head中的C3模块
C3模块的实现类在【3.】中给出，C3的差别在于bottleneck模块。在网络的配置文件中，head中所有C3的shortcut都为False，在代码【L8】行中，self.add=False。

class Bottleneck(nn.Module):
    # Standard bottleneck
    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_, c2, 3, 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

可视化结构如下，
在这里插入图片描述

Detect 模块
该模块是为了得到输出，其实网络很简单，就是三层卷积，不同层得到不同的输出。

class Detect(nn.Module):
    stride = None  # strides computed during build
    onnx_dynamic = False  # ONNX export parameter
    export = False  # export mode

    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer
        super().__init__()
        self.nc = nc  # number of classes
        self.no = nc + 5  # number of outputs per anchor
        self.nl = len(anchors)  # number of detection layers
        self.na = len(anchors[0]) // 2  # number of anchors
        self.grid = [torch.zeros(1)] * self.nl  # init grid
        self.anchor_grid = [torch.zeros(1)] * self.nl  # init anchor grid
        self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)
        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv
        self.inplace = inplace  # use in-place ops (e.g. slice assignment)

    def forward(self, x):
        z = []  # inference output
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

            if not self.training:  # inference
                if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
         	
         	    y = x[i].sigmoid()
                if self.inplace:
                    y[..., 0:2] = (y[..., 0:2] * 2 + self.grid[i]) * self.stride[i]  # xy
                    y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
                    xy, wh, conf = y.split((2, 2, self.nc + 1), 4)  # y.tensor_split((2, 4, 5), 4)  # torch 1.8.0
                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy
                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh
                    y = torch.cat((xy, wh, conf), 4)
                z.append(y.view(bs, -1, self.no))

        return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)