深入浅出理解ResNet网络模型+PyTorch实现

温故而知新，可以为师矣！

一、参考资料

原始论文：Identity Mappings in Deep Residual Networks
原论文地址：Deep Residual Learning for Image Recognition
ResNet详解+PyTorch实现
PyTorch官方实现ResNet
【pytorch】ResNet18、ResNet20、ResNet34、ResNet50网络结构与实现
残差网络ResNet笔记
ResNet详解与实现
Highway Networks
重读《Deep Residual Learning for Image Recognition》之进一步理解残差网络的神秘（附Pytorch代码）

二、相关介绍

1. 深度网络

随着网络层数的加深，网络的表达能力会更强，这是因为卷积核的作用是提取图像的特征，然而一个卷积核是不够的，一个卷积核只能反应图像的某一个特征，所以我们需要多个卷积核，这些不同的卷积核可以提取到图像不同的特征，从而让模型学习图像特征的能力更强。因此有足够的卷积核和足够的参数才可以更好表述原始图像的特征。

因此，深度网络有两个优势特点：

特征的等级随着网络深度的加深而变高；
越深的深度使网络的表达能力更强。

2. 网络模型命名

现在很多网络结构都是一个命名+数字，数字代表网络深度，网络深度指的是网络的权重层，包括卷积层和全连接层，不包括池化层和BN层。

3. BN批量规范化层

批量规范化层（Batch Normalization，简称BN），将一批数据的feature map满足均值为0，方差为1的分布规律。

在图像预处理过程中，通常会对图像进行标准化处理，这样能够加速网络的收敛。如下图所示，对于Conv1来说，输入是满足某一分布的特征矩阵；但对于Conv2而言，输入的feature map就不一定满足某一分布规律（注意这里所说满足某一分布规律并不是指某一个feature map的数据要满足分布规律，理论上是指整个训练样本集所对应feature map的数据要满足分布规律）。而BN的目的就是使feature map满足均值为0，方差为1的分布规律。
在这里插入图片描述

三、ResNet相关介绍

ResNet详解

深度残差网络（Deep residual network, ResNet）是在 2015年由微软实验室提出，斩获当年ImageNet竞赛中分类任务第一名，目标检测第一名，获得COCO数据集中目标检测第一名，图像分割第一名。ResNet的提出是CNN图像史上的一件里程碑事件，由于其在公开数据上展现的优势，作者何凯明也因此摘得CVPR2016最佳论文奖。

1. 引言

网络的深度为什么重要？

因为CNN能够提取low/mid/high-level的特征，网络的层数越多，意味着能够提取到不同level的特征越丰富。并且，越深的网络提取的特征越抽象，越具有语义信息。

为什么不能简单地增加网络层数？

在ResNet网络提出之前，传统的卷积神经网络都是通过将一系列卷积层与池化层进行堆叠得到的。一般我们会觉得网络越深，特征信息越丰富，模型效果应该越好。

但实验证明，传统的卷积网络或者全连接网络在信息传递的时候或多或少会存在信息丢失、信息损耗等问题，简单地增加网络深度存在网络退化问题，同时还有导致梯度消失或者梯度爆炸，导致很深的网络无法训练。

1.1 梯度消失/爆炸问题

随着网络层数加深，反向传播过程中出现梯度消失或者梯度爆炸的问题。反向传播是用来对网络的权重进行调整，包括卷积核的值，隐藏层的权重和偏置，这些都需要反向传播来调整；反向传播主要是计算变化因子来调整权重，而变化因子的计算首先需要计算目标函数（预测值和真实值的差的平方和）对每层网络权重的偏导数。因此，在求反向传播求梯度时利用了链式法则，梯度值会进行一系列的连乘，也就会出现剧烈的缩减或者变大，这种现象就阻碍了模型收敛。

若每一层的误差梯度小于1，在反向传播过程中，每向前传播一次，都要乘以一个小于1的误差梯度，网络越深，所乘的小于1的系数越多，梯度越趋近于0，则会发生“梯度消失”；反之，若每一层的误差梯度大于1，在反向传播过程中，每向前传播一次，都要乘以一个大于1的误差梯度，网络越深，梯度越来越大，则会发生“梯度爆炸”。

梯度消失：0.99^1000=0.00004317

梯度爆炸：1.01^1000=20959.155

解决办法

为了解决梯度消失或梯度爆炸问题，ResNet论文提出通过数据预处理（数据标准化处理），使用标准权重初始化，在网络中使用 BN（Batch Normalization）层来解决。

1.2 网络退化问题(Degradation problem)

随着网络越来越深，训练变得原来越难，网络的优化变得越来越难。理论上，越深的网络，效果应该更好；但实际上，由于训练难度，过深的网络会产生退化问题，效果反而不如相对较浅的网络。随着网络层数增多，网络准确度出现饱和，甚至出现下降，这被称为退化问题。
在这里插入图片描述

解决办法

为了解决深层网络中的退化问题，使神经网络某些层跳过下一层神经元的连接，隔层相连，弱化每层之间的强联系，这种神经网络被称为残差网络 (ResNets)。ResNet论文提出了 residual结构（残差结构）来减轻退化问题，下图是使用residual结构的卷积网络，可以看到随着网络的不断加深，效果并没有变差，而是变的更好了。（虚线是train error，实线是test error）。
在这里插入图片描述

ResNet相比于VGGNet，最大的区别在于有很多的旁路将输入直接连接到后面的层，这种结构也被称为shortcut或者skip connections。

2. 残差映射

在这里插入图片描述

如上图所示，左图称为恒等映射，右图称为残差映射。左图中，假设原始输入为x，理想映射为f(x)，左图虚线框中的部分需要直接拟合出该映射 f(x)，而右图虚线框中的部分需要拟合出残差映射 f(x)-x，残差映射在现实中往往更容易优化。右图中的 f(x) 为理想映射，当右图虚线框内上方的加权运算（如放射）的权重和偏置参数设为0，f(x)即为恒等映射。实际中，当理想映射f(x)极限接近恒等映射时，残差映射也易于捕捉恒等映射的细微波动。

四、Residual残差结构

1. plain与residual网络

由多个残差块组成的神经网络就是残差网络。其结构如下图所示：
在这里插入图片描述

实验表明，这种模型结构对于训练非常深的神经网络，效果很好。另外，为了便于区分，我们把 非残差网络 称为 Plain Network。

2. Residual残差结构简介

如下图所示，residual残差结构使用了一种 short cut 的连接方式（也可理解为“捷径”），让特征矩阵隔层相加，所谓相加是特征矩阵相同位置上的数值进行相加。一般称x为 identity Function，它是一个跳跃连接；称F(x)为ResNet Function，注意F(x)和x形状要相同。在残差块中，输入可通过跨层数据路线更快地向前传播。
在这里插入图片描述

实际应用中，残差结构的 short cut 不一定是隔一层连接，也可以中间隔多层，ResNet所提出的残差网络中就是隔多层。

残差网络是由一系列残差块组成的。ResNet18/34的残差结构是BasicBlock，用的是2个3x3的卷积。ResNet50/101/152的残差结构是Bottleneck，用的是 1x1+3x3+1x1 的卷积。如下图所示，ResNet中两种不同的residual残差结构，左侧残差结构称为 BasicBlock，右侧残差结构称为 Bottleneck。
在这里插入图片描述

跟VggNet类似，ResNet也有多个不同层的版本，而残差结构也有两种对应浅层和深层网络：

	ResNet	残差结构
浅层网络	ResNet18/34	BasicBlock
深层网络	ResNet50/101/152	Bottleneck

下面是 ResNet 18/34 和 ResNet 50/101/152 具体的实线/虚线残差结构图：

ResNet 18/34
ResNet 50/101/152s

ResNet沿用了VGG完整的3×3卷积层设计。首先，BasicBlock残差结构有2个相同输出通道数的3×3卷积层。每个卷积层后接一个BN批量规范化层和ReLU激活函数。然后，通过跨层数据通路，跳过这2个卷积运算，将输入直接加在最后的ReLU激活函数前。这样的设计要求2个卷积层的输出与输入形状一样，从而使它们可以相加。如果想改变通道数，就需要引入一个额外的1×1卷积层来将输入变换成需要的形状后再做相加运算。
在这里插入图片描述

3. `BasicBlock`残差结构

对于18-layer、34-layer网络层数较少的ResNet，由BasicBlock构成，其进行两层间的残差学习，两层卷积核分别是3x3，3x3。basic_block=identity_block，此结构保证了输入和输出相等，实现网络的串联。
在这里插入图片描述

import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo

# 这个文件内包括6中不同的网络架构
__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
           'resnet152']

# 每一种架构下都有训练好的可以用的参数文件
model_urls = {
    'resnet18': 'https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth',
    'resnet34': 'https://s3.amazonaws.com/pytorch/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://s3.amazonaws.com/pytorch/models/resnet50-19c8e357.pth',
    'resnet101': 'https://s3.amazonaws.com/pytorch/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://s3.amazonaws.com/pytorch/models/resnet152-b121ed2d.pth',
}

# 常见的3x3卷积
def conv3x3(in_planes, out_planes, stride=1):
    "3x3 convolution with padding"
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)


class BasicBlock(nn.Module):
    # 残差结构中，主分支的卷积核个数是否发生变化，不变则为1
    expansion = 1
    def __init__(self, inplanes, planes, stride=1, downsample=None):  
        # downsample对应虚线残差结构
    	# inplanes代表输入通道数，planes代表输出通道数。
        super(BasicBlock, self).__init__()
        # Conv1
        self.conv1 = conv3x3(inplanes, planes, stride)
        # stride=1为实线残差结构，不需要改变大小，stride=2为虚线残差结构
        # stride=1，output=（input-3+2*1）/ 1 + 1 = input   输入和输出的shape不变
        # stride=2，output=（input-3+2*1）/ 2 + 1 = input = input/2 + 0.5 = input/2（向下取整）
        self.bn1 = nn.BatchNorm2d(planes)  # 使用BN时不使用偏置
        self.relu = nn.ReLU(inplace=True)
        # Conv2
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        # 下采样
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:  # 虚线残差结构，需要下采样
            residual = self.downsample(x)  # 捷径分支 short cut
		# F(x)+x
        out += residual
        out = self.relu(out)

        return out

BasicBlock类中的 init() 函数定义网络架构，forward() 函数定义前向传播，实现的功能是残差块。
在这里插入图片描述

4. `Bottleneck`残差结构

对于50-layer、101-layer和152-layer网络层数较多的ResNet，由Bottleneck构成，其进行三层间的残差学习，三层卷积核分别是1x1，3x3和1x1。对于深层的 Bottleneck，1×1的卷积核起到降维和升维（特征矩阵深度）的作用，同时可以大大减少网络参数。具体来说，第一层的1× 1的卷积核的作用是对特征矩阵进行降维操作，将特征矩阵的深度由256降为64；第三层的1× 1的卷积核是对特征矩阵进行升维操作，将特征矩阵的深度由64升成256。降低特征矩阵的深度主要是为了减少参数的个数。先降维后升维，是为了主分支上输出的特征矩阵和捷径分支上输出的特征矩阵形状相同，以便进行加法操作。
在这里插入图片描述

值得注意的是，隐含层的feature map的通道数量是比较小的，并且是输出feature map通道数量的1/4。如下图所示，三层卷积核中的前两个卷积核对应的隐含层通道数为64，最后一个卷积核对应的输出层通道数为256，隐含层的通道数是输出层通道数的1/4。
在这里插入图片描述

# ResNet50/101/152的残差结构，用的是1x1+3x3+1x1的卷积核
class Bottleneck(nn.Module):
    """
    注意：原论文中，在虚线残差结构的主分支上，第一个1x1卷积层的步距是2，第二个3x3卷积层步距是1。
    但在pytorch官方实现过程中是第一个1x1卷积层的步距是1，第二个3x3卷积层步距是2，
    这么做的好处是能够在top1上提升大概0.5%的准确率。
    可参考 Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    # 残差结构中第三层卷积核个数是第一/二层卷积核个数的4倍
    expansion = 4      # 输出通道数的倍乘

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        # conv1   1x1
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        # conv2   3x3
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        # stride=stride根据传入的进行调整，因为实线中的第二层是1，虚线中是2
        self.bn2 = nn.BatchNorm2d(planes)
        # conv3   1x1  
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:  # 捷径分支 short cut
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

Bottleneck 类是另一种blcok类型，init() 函数定义网络架构，forward() 函数定义前向传播。该block中有三个卷积，分别是1x1,3x3,1x1，分别完成的功能是压缩维度，卷积，恢复维度。因此，Bottleneck 类实现的功能是对通道数进行压缩，再放大。注意：这里的plane不再是输出的通道数，输出通道数应该就是 $pl an e * e x p an s i o n$ ，即 $4 * pl an e$ 。
在这里插入图片描述

5. 残差结构计算量

可以计算一下，假设两个残差结构的输入特征和输出特征矩阵的通道数都是256维，如下图：
在这里插入图片描述

如果不考虑bias偏置项，CNN参数量计算公式为： $D_K*D_K*M*N$

如果采用BasicBlock残差结构，参数量为：3×3x256×256+3×3x256×256=1179648。
如果采用Bottleneck残差结构，参数量为：1×1×256×64+3×3×64×64+1×1×64×256=69632。

总结：很显然，搭建深层网络时，使用Bottleneck残差结构更合适。

五、ResNet网络

ResNet网络是参考了VGG19网络，在其基础上进行了修改，并通过短路机制加入了残差单元，如下图所示。ResNet相比普通网络每两层间增加了短路机制，这就形成了残差学习。
在这里插入图片描述

ResNet相对于VGG19网络，主要变化体现在：ResNet直接使用stride=2的卷积做下采样（特征图的大小减半，通道数翻倍），并且用 global average pool 层替换了全连接层。这体现了ResNet的一个重要设计原则：当feature map大小降低一半时，feature map的通道数增加一倍，这保持了网络层的复杂度。

1. ResNet网络结构

ResNet一般有4个stack，每一个stack里面都是block的堆叠，所以[3, 4, 6, 3]就是每一个stack里面堆叠block的个数，故而造就了不同深度的ResNet。

resnet18： ResNet(BasicBlock, [2, 2, 2, 2])
resnet34： ResNet(BasicBlock, [3, 4, 6, 3])
resnet50：ResNet(Bottleneck, [3, 4, 6, 3])
resnet101：ResNet(Bottleneck, [3, 4, 23, 3])
resnet152：ResNet(Bottleneck, [3, 8, 36, 3])

如上图所示，ResNet50分为 conv1，conv2_x，conv3_x，conv4_x，conv5_x共5大层，网络层数为： $1 + 1 + 3 * 3 + 4 * 3 + 6 * 3 + 3 * 3 = 50$ ，前面一层卷积层+一层池化层+4组卷积，不考虑最后面的全连接、池化层。

下面是 ResNet 18/34 和 ResNet 50/101/152 具体的实线/虚线残差结构图：

ResNet 18/34
ResNet 50/101/152s

2. ResNet网络创新点

搭建超深的网络结构（可突破1000层）。
提出 Residual 结构（残差结构 )来减轻退化问题。
使用 BN层来解决梯度消失或梯度爆炸问题。使用 BN 加速训练（丢弃dropout）。

在图像预处理过程中通常会对图像进行标准化处理，这样能够加速网络的收敛。BN的目的就是使feature map满足均值为0，方差为1的分布规律。

3. 核心代码

# 整个网络的框架部分
class ResNet(nn.Module):
    # block = BasicBlock or Bottleneck
    # layers为残差结构中conv2_x~conv5_x中残差块个数，是一个列表，如34层中的是[3，4，6，3]
    def __init__(self, block, layers, num_classes=1000):  
        self.inplanes = 64 
        super(ResNet, self).__init__()
        # 1.conv1
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        # 2.conv2_x
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        # 3.conv3_x
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        # 4.conv4_x
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        # 5.conv5_x
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        
        self.avgpool = nn.AvgPool2d(7)
        self.fc = nn.Linear(512 * block.expansion, num_classes)
		
		# 初始化权重
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        # 每个blocks的第一个residual结构保存在layers列表中。
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            # 通过循环将剩下的一系列实线残差结构添加到layers
            layers.append(block(self.inplanes, planes))   
		
        # Sequential将一系列网络结构组合在一起
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)   # 将输出结果展成一行
        x = self.fc(x)

        return x

ResNet一共有5个阶段，第一阶段是一个7x7的卷积，stride=2，然后再经过池化层，得到的特征图大小变为原图的1/4。_make_layer() 函数用来产生4个layer，可以根据输入的layers列表来创建网络。

# resnet18
def resnet18(pretrained=False):
    """Constructs a ResNet-18 model.
	# https://download.pytorch.org/models/resnet18-f37072fd.pth
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2])
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
    return model

# resnet34
def resnet34(pretrained=False):
    """Constructs a ResNet-34 model.
    # https://download.pytorch.org/models/resnet34-333f7ec4.pth
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [3, 4, 6, 3])
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
    return model

# resnet50
def resnet50(pretrained=False):
    """Constructs a ResNet-50 model.
	# https://download.pytorch.org/models/resnet50-19c8e357.pth
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3])
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
    return model

# resnet101
def resnet101(pretrained=False):
    """Constructs a ResNet-101 model.
	# https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 23, 3])
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
    return model

# resnet152
def resnet152(pretrained=False):
    """Constructs a ResNet-152 model.
	# https://download.pytorch.org/models/resnet152-394f9c45.pth
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 8, 36, 3])
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
    return model

六、最基本的ResNet18

ResNet18网络的具体构成
PyTorch实现ResNet18
ResNet18结构、各层输出维度
ResNet 18 的结构解读「建议收藏」
Resnet 18网络模型[通俗易懂]
Resnet 18网络模型
Resnet-18网络图示理解

1. ResNet18网络结构

18-layer的ResNet命名为ResNet18，其网络深度是18层，具体是指带有权重的18层，包括：卷积层和全连接层，不包括池化层和BN层。如下图所示，卷积层有17个，FC层1个，所以是18层。
在这里插入图片描述

虚线的 short cut 通过1×1的卷积核进行了维度处理（特征矩阵在长宽方向下采样，深度方向调整成下一层残差结构所需要的channel）。

channel通道减半。通过1x1卷积调整通道数，实线表示残差块中的通道数没有变化，虚线表示通道数变化，例如64->128。
特征矩阵shape减半。将步长调整成2，实现下采样。

提示：

BN 表示批量归一化
   
RELU 表示激活函数
   
lambda x:x 这个函数的意思是输出等于输入
   
identity 表示残差
   
1个resnet block 包含2个basic block
1个resnet block 需要添加2个残差
   
在resnet block之间残差形式是1*1conv，在resnet block内部残差形式是lambda x:x
resnet block之间的残差用实线箭头表示，resnet block内部的残差用虚线箭头表示
   
3*3conv s=2，p=1 特征图尺寸会缩小
3*3conv s=1，p=1 特征图尺寸不变

（1）conv1卷积层

首先，经过一个卷积层。该卷积层的卷积核的大小为7x7，步长为2，padding为3，输出通道为64。根据公式：
$n_{out}=\left\lfloor\frac{n_{in}+2p-k}{s}\right\rfloor+1$
我们可以算出最后输出数据的大小为64x112x112。

（2）maxpooling池化层

在这里插入图片描述

最大池化层，这一层的卷积核的大小是3x3，步长为2，padding为1。最后输出数据的大小为64x56x56。也就是说，这一层不改变数据的通道数量，而特征矩阵shape减半。

（3）conv2_x卷积层（Resnet block1）

该卷积层的卷积核大小为3x3，步长为1，padding为1。最后通过两次卷积计算，输出数据大小为64x56x56，这一层不改变数据的大小和通道数。
在这里插入图片描述

（4）conv3_x卷积层（Resnet block2）

通过一个1x1的卷积层升维，并经过一个下采样。最终输出为128x28x28。输出的channel通道翻倍，输出的特征矩阵shape减半。
在这里插入图片描述

（5）conv4_x卷积层（Resnet block3）

通过一个1x1的卷积层，并经过一个下采样。最终输出为256x14x14。输出的channel通道翻倍，输出的特征矩阵shape减半。
在这里插入图片描述

（6）Resnet block4（conv5_x卷积层）

和上述同理，最终输出为512x7x7。输出的channel通道翻倍，输出的特征矩阵shape减半。
在这里插入图片描述

（7）avgpooling层

最终输出为512x1x1。

（8）FC层

七、源码分析

ResNet网络结构详解，网络搭建，迁移学习
pytorch图像分类篇：6. ResNet网络结构详解与迁移学习简介

1. `model.py`

import torch.nn as nn
import torch


# ResNet18/34的残差结构，用的是2个3x3的卷积核
class BasicBlock(nn.Module):
    expansion = 1  # 残差结构中，主分支的卷积核个数是否发生变化，不变则为1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):  # downsample对应虚线残差结构
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=(3, 3), stride=(stride, stride),
                               padding=1, bias=False)
        # stride=1为实线残差结构，不需要改变大小，stride=2为虚线残差结构
        # stride=1，output=（input-3+2*1）/ 1 + 1 = input   输入和输出的shape不变
        # stride=2，output=（input-3+2*1）/ 2 + 1 = input = input/2 + 0.5 = input/2（向下取整）
        self.bn1 = nn.BatchNorm2d(out_channel)   # 使用BN时不使用偏置
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=(3, 3), stride=(1, 1), padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:  # 虚线残差结构，需要下采样
            identity = self.downsample(x)  # 捷径分支 short cut

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out


# ResNet50/101/152的残差结构，用的是1x1+3x3+1x1的卷积核
class Bottleneck(nn.Module):
    """
    注意：原论文中，在虚线残差结构的主分支上，第一个1x1卷积层的步距是2，第二个3x3卷积层步距是1。
    但在pytorch官方实现过程中是第一个1x1卷积层的步距是1，第二个3x3卷积层步距是2，
    这么做的好处是能够在top1上提升大概0.5%的准确率。
    可参考 Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    expansion = 4  # 残差结构中第三层卷积核个数是第一/二层卷积核个数的4倍

    def __init__(self, in_channel, out_channel, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()

        width = int(out_channel * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
                               kernel_size=(1, 1), stride=(1, 1), bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
                               kernel_size=(3, 3), stride=(stride, stride), bias=False, padding=1)
        # stride=stride根据传入的进行调整，因为实线中的第二层是1，虚线中是2
        self.bn2 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel * self.expansion,  # 卷积核个数变为4倍
                               kernel_size=(1, 1), stride=(1, 1), bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)  # 捷径分支 short cut

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


# 整个网络的框架部分
class ResNet(nn.Module):
    # block = BasicBlock or Bottleneck
    # block_num为残差结构中conv2_x~conv5_x中残差块个数，是一个列表，如34层中的是[3，4，6，3]
    def __init__(self,
                 block,
                 blocks_num,
                 num_classes=1000,
                 include_top=True,  # 方便在resnet网络的基础上搭建其他网络，这里用不到
                 groups=1,
                 width_per_group=64):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64

        self.groups = groups
        self.width_per_group = width_per_group

        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=(7, 7), stride=(2, 2),
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)，自适应平均池化下采样
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    # channel为残差结构中第一层卷积核个数，block_num表示该层一共包含多少个残差结构，如34层中的是3，4，6，3
    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        # ResNet50/101/152的残差结构，block.expansion=4
        if stride != 1 or self.in_channel != channel * block.expansion:  # layer2，3，4都会经过这个结构
            downsample = nn.Sequential(  # 生成下采样函数，这里只需要调整conv2的特征矩阵的深度
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=(1, 1), stride=(stride, stride), bias=False),
                nn.BatchNorm2d(channel * block.expansion))

        layers = []
        # #首先将第一层残差结构添加进去，block = BasicBlock or Bottleneck
        layers.append(block(self.in_channel,  # 输入特征矩阵的深度64
                            channel,  # 残差结构对应主分支上的第一个卷积层的卷积核个数
                            downsample=downsample,  # 50/101/152对应的是高宽不变，深度4倍，对应的虚线残差结构
                            stride=stride,
                            groups=self.groups,
                            width_per_group=self.width_per_group))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            # 通过循环将剩下的一系列实线残差结构添加到layers
            layers.append(block(self.in_channel,
                                channel,
                                groups=self.groups,
                                width_per_group=self.width_per_group))

        # Sequential将一系列网络结构组合在一起
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x


def resnet18(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet18-f37072fd.pth
    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes=num_classes, include_top=include_top)


def resnet34(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet34-333f7ec4.pth
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet50(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet50-19c8e357.pth
    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet101(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)

def resnet152(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet152-394f9c45.pth
    return ResNet(Bottleneck, [3, 8, 36, 3], num_classes=num_classes, include_top=include_top)


def resnext50_32x4d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
    groups = 32
    width_per_group = 4
    return ResNet(Bottleneck, [3, 4, 6, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)


def resnext101_32x8d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth
    groups = 32
    width_per_group = 8
    return ResNet(Bottleneck, [3, 4, 23, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)

2. `train.py`

import os
import sys
import json
 
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, datasets
from tqdm import tqdm
 
from model import resnet34


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))
 
    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
        "val": transforms.Compose([transforms.Resize(256),      #原图的长宽比固定不动，把最小边长缩放到256
                                   transforms.CenterCrop(224),      #中心裁剪
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}
 
    data_root = os.path.abspath(os.path.join(os.getcwd(), "../"))  # get data root path
    image_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set path
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)
 
    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)
 
    batch_size = 4
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))
 
    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size, shuffle=True,
                                               num_workers=nw)
 
    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=batch_size, shuffle=False,
                                                  num_workers=nw)
 
    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))
    
    net = resnet34()
    # load pretrain weights
    # download url: https://download.pytorch.org/models/resnet34-333f7ec4.pth
    model_weight_path = "./resnet34-pre.pth"
    assert os.path.exists(model_weight_path), "file {} does not exist.".format(model_weight_path)
    net.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
    # for param in net.parameters():
    #     param.requires_grad = False
 
    # change fc layer structure
    in_channel = net.fc.in_features
    net.fc = nn.Linear(in_channel, 5)
    net.to(device)
 
    # define loss function
    loss_function = nn.CrossEntropyLoss()
 
    # construct an optimizer
    params = [p for p in net.parameters() if p.requires_grad]
    optimizer = optim.Adam(params, lr=0.0001)
 
    epochs = 3
    best_acc = 0.0
    save_path = './resNet34.pth'
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train()
        running_loss = 0.0
        train_bar = tqdm(train_loader, file=sys.stdout)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            logits = net(images.to(device))
            loss = loss_function(logits, labels.to(device))
            loss.backward()
            optimizer.step()
 
            # print statistics
            running_loss += loss.item()
 
            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)
 
        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader, file=sys.stdout)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                # loss = loss_function(outputs, test_labels)
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
 
                val_bar.desc = "valid epoch[{}/{}]".format(epoch + 1,
                                                           epochs)
 
        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))
 
        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)
 
    print('Finished Training')
 
 
if __name__ == '__main__':
    main()

3. `predict.py`

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import resnet34


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 
    data_transform = transforms.Compose(
        [transforms.Resize(256),
         transforms.CenterCrop(224),
         transforms.ToTensor(),
         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
 
    # load image
    img_path = "../tulip.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)
 
    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)
 
    with open(json_path, "r") as f:
        class_indict = json.load(f)
 
    # create model
    model = resnet34(num_classes=5).to(device)
 
    # load model weights
    weights_path = "./resNet34.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path, map_location=device))
 
    # prediction
    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()
 
    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()
 
 
if __name__ == '__main__':
    main()