卷积神经网络（三）---案例分析

上面部分介绍了 PyTorch 中的卷积模块，接下来将会介绍几个卷积神经网络的案例，通过案例入手来介绍卷积神经网络的结构设计。

1. LeNet

LeNet 是整个卷积神经网络的开山之作，1998年由 LeCun 提出，它的结构特别简单，我们能够由此入手，一步一步地进入时下最为流行的卷积神经网络结构。

首先，LeNet 的网络结构如图 4.16 所示。

从图 4.16 可以看出整个网络结构特别清晰，一共有7层，其中2层卷积和2层池化层交替出现，最后输出3层全连接层得到整体的结果。

针对这个简单的网络，可以自己操作。

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()

        layer1 = nn.Sequential()
        layer1.add_module('conv1', nn.Conv2d(1, 6, 3, padding=1))
        layer1.add_module('pool1', nn.MaxPool2d(2, 2))
        self.layer1 = layer1

        layer2 = nn.Sequential()
        layer2.add_module('conv2', nn.Conv2d(6, 16, 5))
        layer2.add_module('pool2', nn.MaxPool2d(2, 2))
        self.layer2 = layer2

        layer3 = nn.Sequential()
        layer3.add_module('fc1', nn.Linear(400, 120))
        layer3.add_module('fc2', nn.Linear(120, 84))
        layer3.add_module('fc3', nn.Linear(84, 10))
        self.layer3 = layer3

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.view(x.size(0), -1)
        x = self.layer3(x)
        return x

这样就实现了 LeNet 网络，可以发现网络的层数很浅，也没有添加激活层。

2. AlexNet

接下来要介绍2012年在 ImageNet 竞赛上面大放异彩的 AlexNet，它以领先第二名 10% 的准确率夺得冠军，并且成功地向世界展示了深度学习的威力。

首先看看 AlexNet 的网络结构，如图4.17所示。

图 4.17 可能看起来眼花缭乱，其实这是因为当时GPU计算能力不强，而 AlexNet 又比较复杂，所以 Alex 使用了两个 GPU 并行来做运算，现在已经完全可以用一个 GPU 代替了。

AlexNet 网络相对于LeNet，层数更深，同时第一次引入了激活层 ReLU，在全连接层引入了 Dropout 层防止过拟合。

实现 AlexNet 的网络结构如下。

class AlexNet(nn.Module):
    def __init__(self, num_classes):
        super(AlexNet, self).__init__()

        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2)
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256*6*6)
        x = self.classifier(x)
        return x

这是 ImageNet 竞赛史上第一次基于卷积神经网络的模型得到冠军，从此掀起了深度学习在计算机视觉上的革命。

3. VGGNet

VGGNet 是 ImageNet 2014年的亚军，总结起来就是它使用了更小的滤波器，同时使用了更深的结构，AlexNet 只有8层网络，而 VGGNet 有16层~19层网络，也不像 AlexNet 使用 11x11 那么大的滤波器，它只使用 3x3 的卷积滤波器和 2x2 的大池化层。图4.18是 AlexNet 和 VGGNet 的对比图。

它之所以使用很多小的滤波器，是因为层叠很多小的滤波器的感受野和一个大的滤波器的感受野是相同的，还能减少参数，同时有更深的网络结构。

同样可以实现如下。

class VGG(nn.Module):
    def __init__(self, num_classes):
        super(VGG, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.classifier = nn.Sequential(
            nn.Linear(512*7*7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes)
        )
        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)

其实可以看出 VGG 只是对网络层进行不断的堆叠，并没有进行太多的创新，而增加深度确实可以一定程度改善模型效果。

4. GoogLeNet

GoogLeNet 也叫 InceptionNet，是在2014年被提出的，如今已经进化到了 v4版本，下面介绍它最核心的部分。

GoogLeNet 采取了比 VGGNet 更深的网络结构，一共有22层，但是它的参数却比 AlexNet 少了12倍，同时有很高的计算效率，因为它采用了一种很有效的 Inception 模块，而且它也没有全连接层，是2014年比赛的冠军。

先看看 GoogLeNet 的网络结构和其中最为创新的 Inception 模块，如图4.19所示。

Inception 模块设计了一个局部的网络拓扑结构，然后将这些模块堆叠在一起形成一个抽象层网络结构。具体来说就是运用几个并行的滤波器对输入进行卷积和池化，这些滤波器有不同的感受野，最后将输出的结果按深度拼接在一起形成输出层。

这样的网络结构非常新颖，正是由于这种网络结构，GoogLeNet 才能够取得如此大的成功，但是这种网络结构有一个小问题，就是参数太多，导致计算复杂。

为了解决这个问题，GoogLeNet 又推出了下一个版本，这个版本对 Inception 模块有了新的设计，如图4.20所示。

这个模块增加了一些 1x1 的卷积层来降低输入层的维度，使网络参数减少，从而减少了网络的复杂性。

下面实现 GoogLeNet 中的 Inception 模块，整个 GoogLeNet 都是由这些 Inception 模块组成的。

class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, basic=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return F.relu(x, inplace=True)


class Inception(nn.Module):
    def __init__(self, in_channels, pool_features):
        super(Inception, self).__init__()
        self.branch1x1 = BasicConv2d(in_channels, 64, kernel_size=1)
        self.branch5X5_1 = BasicConv2d(in_channels, 48, kernel_size=1)
        self.branch5X5_2 = BasicConv2d(48, 64, kernel_size=5, padding=2)

        self.branch3x3dbl_1 = BasicConv2d(in_channels, 64, kernel_size=1)
        self.branch3x3dbl_2 = BasicConv2d(64, 96, kernel_size=3, padding=1)
        self.branch3x3dbl_3 = BasicConv2d(96, 96, kernel_size=3, padding=1)

        self.branch_pool = BasicConv2d(in_channels, pool_features, kernel_size=1)

    def forward(self, x):
        branch1x1 = self.branch1x1(x)

        branch5x5 = self.branch5X5_1(x)
        branch5x5 = self.branch5X5_2(branch5x5)

        branch3x3dbl = self.branch3x3dbl_1(x)
        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
        branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)

        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)

        outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
        return torch.cat(outputs, 1)

首先定义一个最基础的卷积模块，然后根据这个模块定义了 1x1，3x3 和 5x5 的模块和一个池化层，最后使用 torch.cat() 将它们按深度拼接起来，得到输出结果。

5. ResNet

ResNet 是 2015 年 ImageNet 竞赛的冠军，由微软研究院提出，通过残差模块能够成功地训练高达152层深的神经网络。

ResNet 最初的设计灵感来自这个问题：在不断加深神经网络的时候，会出现一个 Degradation，即准确率会先上升然后达到饱和，再持续增加深度则会导致模型准确率下降。

这并不是过拟合的问题，因为不仅在验证集上误差增加，训练集本身误差也会增加。假设一个比较浅的网络达到了饱和的准确率，那么在后面加上几个恒等映射层，误差不会增加，也就说更深的模型起码不会使得模型效果下降。

这里提到的使用恒等映射直接将前一层输出到后面的思想，就是 ResNet 的灵感来源。假设某个神经网络的输入是x，期望输出是 H(x)，如果直接把输入 x 传到输出作为初始结果，那么此时需要学习的目标就是 $F(x) = H(x) - x$ ，也就是下面这个残差模块，如图 4.21 所示。

图 4.21 左边是一个普通的网络，右边是一个 ResNet 的残差学习单元，ResNet 相当于将学习目标改变了，不再是学习一个完整的输出 H(x)，而是学习输出和输入的差别 $H(x)-x$ ，即残差。

ResNet 因为残差模块的存在，使整个网络可以训练高达 152 层，下面只用 ResNet 网络中的残差模块举例。

def conv3x3(in_planes, out_planes, stride=1):
    "3x3 convolution with padding"
    return nn.Conv2d(
        in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False
    )

class BasicBlock(nn.Module):
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x
        
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        
        out = self.conv2(out)
        out = self.bn2(out)
        
        if self.downsample is not None:
            residual = self.downsample(x)
        
        out += residual
        out = self.relu(out)
        
        return out

从 forward 的最后一行，能够看出网络将最开始的 x 加到了输出当中，形成了残差结构。

除了这些比较出名的网络之外，卷积神经网络的世界中还有很多别的网络，比如 Network in Network、Highway Network 等，大家可以自己去看相关的论文。另外并不需要重复造轮子，PyTorch 中早就为我们实现了上面介绍过的这些网络，都在 torchvision.model 里面，同时大部分网络都有预训练好的参数，这些预训练好的网络为后面介绍的迁移学习和微调做了很好的铺垫。