YOLO系列解读DAY1—YOLOV1预训练模型

news2025/4/8 13:48:46

一、说在前面

小伙伴们好，博主很久没有写博客了，略感生疏，不到之处敬请谅解，欢迎指出文中错误，大家一起探讨。欲看视频讲解，可转至博主DouYin、B站，欢迎关注，链接如下：

Github： samylee (samylee) · GitHub

DY： samylee_csdn

B站：samylee

二、写作初衷

YOLO系列详解博客网上很多且质量很高，但大多数都是已文字和图像的形式呈现，未能结合代码做进一步阐述。且yolov1/v2是纯c代码，调试困难，无法直观反映训练和测试过程。所以博主将该代码转到PyTorch下，可一目了然理清Darknet作者的算法思路。话不多说，下面开始！

三、YOLOV1预训练模型

众所周知，在当时预训练模型的性能会直接影响整个检测网络的效果，所以设计出一款好的分类模型是做检测任务的必经之路。分类网络原理部分这里不做进一步阐述了，博主会开设另外系列博客讲解分类网络，敬请期待。

Darknet作者采用自研的extraction网络，在当时已经达到SOTA水平，且速度明显优于同时期SOTA模型，性能如下图所示。

网络设计方面作者采用直到型架构，因当时何恺明等大神尚未提出残差网络，所以直到型还是主流网络架构，如下图所示，其中'M'表示最大池化层，'A'表示平均池化层。

四、Extraction网络转PyTorch

网络转换需要注意三个地方：

1、要求将Extraction网络按照其特定的网络序列复写出PyTorch的网络架构，切不可心浮气躁；

2、Darknet存储参数的序列和PyTorch稍有不同，若遇到BatchNorm架构的卷积层，Darknet会先存储BatchNorm层的参数，进而存储卷积层的参数；

3、图像数据预处理部分，Darknet分类网络的顺序是：BGR2RGB->Norm(1/255)->Resize->Crop，所以我们在复现的时候一定要按照该顺序进行数据预处理操作。

网络复现如下，版本PyTorch1.4，Python3.6，博主暂未加PyTorch的训练代码，因ImageNet训练时间较长，所以直接使用了Extraction的网络模型，若有需要的小伙伴可以评论或私信给博主，博主可添加Extraction分类网络的PyTorch的训练方法。

import torch
import torch.nn as nn
import torch.nn.functional as F
import cv2
import numpy as np


class ExtractionNet(nn.Module):
    def __init__(self, in_channels=3, num_classes=1000, init_weights=True, phase='train'):
        super(ExtractionNet, self).__init__()
        self._phase = phase
        self.features = self.make_layers(in_channels=in_channels, out_channels=num_classes)
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        for feature in self.features:
            x = feature(x)

        if self._phase == 'test':
            x = torch.flatten(x, 1)
            x = F.softmax(x, dim=-1)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='leaky_relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

    def make_layers(self, in_channels=3, out_channels=1000):
        # conv: out_channels, kernel_size, stride, batchnorm, activate
        # maxpool: kernel_size stride
        params = [
            [64, 7, 2, True, 'leaky'],
            ['M', 2, 2],
            [192, 3, 1, True, 'leaky'],
            ['M', 2, 2],
            [128, 1, 1, True, 'leaky'],
            [256, 3, 1, True, 'leaky'],
            [256, 1, 1, True, 'leaky'],
            [512, 3, 1, True, 'leaky'],
            ['M', 2, 2],
            [256, 1, 1, True, 'leaky'],
            [512, 3, 1, True, 'leaky'],
            [256, 1, 1, True, 'leaky'],
            [512, 3, 1, True, 'leaky'],
            [256, 1, 1, True, 'leaky'],
            [512, 3, 1, True, 'leaky'],
            [256, 1, 1, True, 'leaky'],
            [512, 3, 1, True, 'leaky'],
            [512, 1, 1, True, 'leaky'],
            [1024, 3, 1, True, 'leaky'],
            ['M', 2, 2],
            [512, 1, 1, True, 'leaky'],
            [1024, 3, 1, True, 'leaky'],
            [512, 1, 1, True, 'leaky'],
            [1024, 3, 1, True, 'leaky'],
            [out_channels, 1, 1, False, 'leaky'], # classifier
            ['A']
        ]

        module_list = nn.ModuleList()
        for i, v in enumerate(params):
            modules = nn.Sequential()
            if v[0] == 'M':
                modules.add_module(f"maxpool_{i}", nn.MaxPool2d(kernel_size=v[1], stride=v[2], padding=int((v[1] - 1) // 2)))
            elif v[0] == 'A':
                modules.add_module(f"avgpool_{i}", nn.AdaptiveAvgPool2d((1, 1)))
            else:
                modules.add_module(
                    f"conv_{i}",
                    nn.Conv2d(
                        in_channels,
                        v[0],
                        kernel_size=v[1],
                        stride=v[2],
                        padding=(v[1] - 1) // 2,
                        bias=not v[3]
                    )
                )
                if v[3]:
                    modules.add_module(f"bn_{i}", nn.BatchNorm2d(v[0]))
                modules.add_module(f"act_{i}", nn.LeakyReLU(0.1) if v[4] == 'leaky' else nn.ReLU())
                in_channels = v[0]
            module_list.append(modules)
        return module_list


def load_darknet_weights(model, weights_path):
    # Open the weights file
    with open(weights_path, "rb") as f:
        # First five are header values
        header = np.fromfile(f, dtype=np.int32, count=4)
        header_info = header  # Needed to write header when saving weights
        seen = header[3]  # number of images seen during training
        weights = np.fromfile(f, dtype=np.float32)  # The rest are weights

    ptr = 0
    for module in model.features:
        if isinstance(module[0], nn.Conv2d):
            conv_layer = module[0]
            if isinstance(module[1], nn.BatchNorm2d):
                # Load BN bias, weights, running mean and running variance
                bn_layer = module[1]
                num_b = bn_layer.bias.numel()  # Number of biases
                # Bias
                bn_b = torch.from_numpy(
                    weights[ptr: ptr + num_b]).view_as(bn_layer.bias)
                bn_layer.bias.data.copy_(bn_b)
                ptr += num_b
                # Weight
                bn_w = torch.from_numpy(
                    weights[ptr: ptr + num_b]).view_as(bn_layer.weight)
                bn_layer.weight.data.copy_(bn_w)
                ptr += num_b
                # Running Mean
                bn_rm = torch.from_numpy(
                    weights[ptr: ptr + num_b]).view_as(bn_layer.running_mean)
                bn_layer.running_mean.data.copy_(bn_rm)
                ptr += num_b
                # Running Var
                bn_rv = torch.from_numpy(
                    weights[ptr: ptr + num_b]).view_as(bn_layer.running_var)
                bn_layer.running_var.data.copy_(bn_rv)
                ptr += num_b
            else:
                # Load conv. bias
                num_b = conv_layer.bias.numel()
                conv_b = torch.from_numpy(weights[ptr: ptr + num_b]).view_as(conv_layer.bias)
                conv_layer.bias.data.copy_(conv_b)
                ptr += num_b
            # Load conv. weights
            num_w = conv_layer.weight.numel()
            conv_w = torch.from_numpy(weights[ptr: ptr + num_w]).view_as(conv_layer.weight)
            conv_layer.weight.data.copy_(conv_w)
            ptr += num_w


if __name__ == '__main__':
    # load moel
    checkpoint_path = "extraction.weights"
    model = ExtractionNet(phase='test')
    load_darknet_weights(model, checkpoint_path)
    model.eval()

    # model input size
    net_size = 224

    # load img
    img = cv2.imread('dog.jpg')
    # img bgr2rgb
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # resize img
    h, w, _ = img_rgb.shape
    new_h, new_w = h, w
    if w < h:
        new_h = (h * net_size) // w
        new_w = net_size
    else:
        new_w = (w * net_size) // h
        new_h = net_size
    img_resize = cv2.resize(img_rgb, (new_w, new_h))

    # crop img
    cut_w = (new_w - net_size) // 2
    cut_h = (new_h - net_size) // 2
    img_crop = img_resize[cut_h:cut_h + net_size, cut_w:cut_w + net_size, :]

    # float
    img_crop = torch.from_numpy(img_crop.transpose((2, 0, 1)))
    img_float = img_crop.float().div(255).unsqueeze(0)

    # forward
    result = model(img_float)

    print(result.topk(5))