PyTorch - 常见神经网络

文章目录

- LeNet
- AlexNet
- - Dropout
  - AlexNet 网络结构
  - torchvision中的AlexNet的实现
- ZFNet
- VGG-Nets
- - VGG 各网络
  - VGG-16 网络结构
- GoogLeNet
- - 代码实现
- ResNet
- DenseNet
- RNN
- LSTM
- GRU

LeNet

1998年，由 LeCun 提出
用于手写数字识别任务
只有5层结构；目前看来不输入深度学习网络；但是是基本确定了卷积NN的基本架构：卷积层、池化层、全连接层；
很多CNN仍然沿用这个架构，在其上做了创新，如 AlexNet, ZF-Net, VGG-Nets, DenseNet
较简单，所以 torchvision 中没有收留此模型

LeNet 网络结构

输入：1 x 28 x 28 灰度图
第1个卷积层：20个 5x5 卷积核，stride=1；结果为 20 x 24 x 24；
第1个池化层：核大小为 2x2 的 Max-Pooling，stride = 2；池化后，尺寸减半，结果为 20 x 12 x 12；
第2个卷积层：50个5x5的卷积核，stride=1，结果为：50x8x8；
第2个池化层：核大小为 2x2 的 Max-Pooling，stride = 2；池化后，尺寸减半，结果为 50 x 4 x 4；
第1个全连接层：500个神经元；输出维度为 50x4x4；结果接入 ReLU 激活函数，增加非线性能力。
第2个全连接层：10个神经元；输出到 softmax 函数中，计算每个类别的得分值，完成分类；

LeNet-5 网络结构
LeNet-5由七层组成（不包括输入层），每一层都包含可训练权重。
LeNet

LeNet 代码实现

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt

class LeNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, kernal_size=(5, 5), stride=1)
        self.pool1 = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(20, 50, kernel_size=(5, 5), stride=1)
        self.pool2 = nn.MaxPool2d(2)
        
        self.fc1 = nn.Linear(800, 500)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear()
        self.relu2 = nn.ReLU()
        
    def forward(self, x):
        x = self.pool1(self.conv1(x))
        x = self.pool2(self.conv2(x))
        x = self.relu1(self.fc1(x.view(-1, 800)))
        x = self.relu2(self.fc2(x))
        return F.log_softmax(x)

AlexNet

2012年，ImageNet 竞赛冠军获得者 Hinton 和其学生 Alex Krizhevsky 设计。
论文：ImageNet Classification with Deep Convolutional Neural Networks
https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

相比 LeNet 的优点：

网络结构更深，学习能力增强
前五层都是卷积层，后三层为全连接层；使用 softmax 函数完成了 1000个类别的多分类任务；
使用了数据增强，增加了模型的泛化能力
数据增强方式：随机裁剪、旋转角度、变换图片亮度等
引入了 DropOut，避免过拟合
引入机器学习集成算法的思想，让FC的神经元以一定的概率失活（常用概率：50%），失活的神经元不参与前向传播和反向传播；经过 dropout 后，使用输出值除以失活概率以突出被选中的神经元。
引入局部响应归一化处理 LRN

其他特点

相比 LeNet 拥有更多卷积层，但结构和流程没有变化；受当时 GPU 设备限制，采用双 GPU 设备并行训练；
LRN: Local Response Normalization，局部响应归一化

Dropout

a = torch.Tensor([[1, 2, 3, 4], [5, 6, 7, 8]])
b = nn.Dropout(0.5)
b(a)
'''
tensor([[ 0.,  0.,  0.,  8.],
        [10., 12., 14., 16.]])
'''

AlexNet 网络结构

torchvision中的AlexNet的实现

参考：https://blog.csdn.net/pengchengliu/article/details/108909195

import torch
import torch.nn as nn
from .utils import load_state_dict_from_url
 
 
__all__ = ['AlexNet', 'alexnet']
 
 
model_urls = {
    'alexnet': 'https://download.pytorch.org/models/alexnet-owt-4df8aa71.pth',
}
 
 
class AlexNet(nn.Module):
 
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            #conv1
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),  # 3表示输入的图像是3通道的。64表示输出通道数，也就是卷积核的数量。
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
 
            #conv2
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
 
            #conv3
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
 
            #conv4
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
 
            #conv5
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6)) #如果输入的图片不是224*224，那上面的池化输出的就可能不是6*6。而下面的FC层要求的输入必须是6*6,否则就没法运算。
        self.classifier = nn.Sequential(
            #FC1
            nn.Dropout(),   #用于减轻过拟合
            nn.Linear(256 * 6 * 6, 4096),  #全连接层。输入的是前面的特征图。特征图是6*6大小的，有256个特征图。自己有4096个神经元。
            nn.ReLU(inplace=True),
 
            #FC2
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
 
            #FC3
            nn.Linear(4096, num_classes), #num_classes:这里是1000.
        )
 
    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)       #liupc:自适应的池化。
        x = torch.flatten(x, 1)   #拉成一个向量的形式。
        x = self.classifier(x)
        return x
 
 
def alexnet(pretrained=False, progress=True, **kwargs):
    r"""AlexNet model architecture from the `"One weird trick..." <https://arxiv.org/abs/1404.5997>`_ paper.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet 
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    model = AlexNet(**kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['alexnet'],
                                              progress=progress)
        model.load_state_dict(state_dict)
    return model

ZFNet

论文：Visualizing and Understanding Convolutional Networks
https://arxiv.org/abs/1311.2901

是 2013年 ImageNet 竞赛中分类任务冠军
知名度不高。网络结构没有太大改进，只是在 AlexNet 基础上调节了参数，提升了分类性能。
在 AlexNet 基础上，ZFNet
- 将第一个卷积层的卷积核由11变为7
- stride 由4变为2
- 后面卷积层依次变为 384、384、256
- 增加了卷积层个数。

复现代码：

class ZFNer(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=(7,7), stride=(2, 2)),
            nn.MaxPool2d((3, 3), stride=(2, 2)),
            nn.Conv2d(96, 256, kernel_size=(5,5), stride=(2, 2)),
            nn.MaxPool2d((3, 3), stride=(2, 2)),
            
            nn.Conv2d(256, 384, kernel_size=(3,3), stride=(1, 1)),
            nn.Conv2d(384, 384, kernel_size=(3,3), stride=(1, 1)),
            nn.Conv2d(384, 256, kernel_size=(3,3), stride=(1, 1)),
            
            nn.MaxPool2d((3, 3), stride=(2, 2)),
        )
        
        self.classifier = nn.Sequential(
            nn.Linear(43264, 4096),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(4096, 1000)
        )
    
    def forward(self, x):
        
        x = self.features(x)
        x = x.view(-1, 43264)
        x = self.classifier(x)
        return x

VGG-Nets

2014年提出，牛津大学提出；获得定位任务第一名，分类任务第二名；
VGG: Visual Geometry Group
使用了更深的网络结构，可看做 AlexNet 加深版；结构也沿用了卷积层和全连接层。
VGG-16、VGG-19 的明明说明了其网络结构层数（池化层不计算在内）；
为了解决权重初始化的问题，VGG 采用预训练的方式。

VGG 各网络

VGG 训练一小部分网络，在确保这部分网络稳定后，在此基础上加深。
下图提现的是这个过程，处于D阶段（VGG-16）的时候，效果最优。

VGG-16 网络结构

在这里插入图片描述

VGG-16 优点：结构更深，卷积层使用更小的卷积核尺寸和间隔。
小尺寸卷积核优点：

比大尺寸有更多的非线性，使判决函数更具有判决性；
有更少的参数。3个 3x3 卷积层参数个数为 3x3x3 = 27；一个 7x7 卷积层参数个数为 7x7 = 49。

GoogLeNet

2014年由Christian Szegedy提出。ImageNet 竞赛分类任务冠军。
在加深网络结构的同时，引入了 Inception 结构，代替卷积和激活函数。
Inception 可参考文章： https://blog.csdn.net/weixin_43821559/article/details/123379833
https://blog.csdn.net/weixin_44772440/article/details/122952961
GoogLeNet 整个网络实际是 Inception 的堆叠。

论文：Going Deeper with Convolutions
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

代码实现

class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels, eps=0.01)
        
    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return F.relu(x, inplace=True)
        
        
        
class Inception(nn.Module):
    def __init__(self, in_channels, pool_features):
        super(Inception, self).__init__()
        self.branch1X1 = BasicConv2d(in_channels, 64, kernel_size=1)
        self.branch5X5_1 = BasicConv2d(in_channels, 48, kernel_size=1)
        self.branch5X5_2 = BasicConv2d(48, 64, kernel_size=5, padding=2)
        
        self.branch3X3_1 = BasicConv2d(in_channels, 64, kernel_size=1)
        self.branch3X3_2 = BasicConv2d(64, 96, kernel_size=3, padding=1)
        self.branch_pool = BasicConv2d(in_channels, pool_features, kernel_size=1)
        
    def forward(self, x):
        branch1X1 = self.branch1X1(x)
        branch5X5 = self.branch5X5_1(x)
        branch5X5 = self.branch5X5_2(branch5X5)
        branch3X3 = self.branch3X3_1(x)
        branch3X3 = self.branch3X3_2(branch3X3)
        
        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)
        outputs = [branch1X1, branch3X3, branch5X5, branch_pool]
        
        return torch.cat(outputs, 1)

ResNet

2015年，何恺明推出；在 ISLVRC 和 COCO竞赛中获得冠军；
网络达到了152层，引入残差单元来解决梯度消失的问题；
关于梯度消失：梯度是从后向前传播，增加梯度后，靠前的层梯度会很小（由链式法则造成）；这意味着这些曾基本上学习停滞，就是梯度消失问题。
退化问题：网络更深，参数空间更大，优化问题更难。
论文: Deep Residual Learning for Image Recognition
https://arxiv.org/abs/1512.03385

残差网络：
结构上看与电路的“短路”类似

在这里插入图片描述

代码实现

class Bottlenect(nn.Module):
    expansion = 4
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottlenect, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride
    
    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        out = self.conv3(out)
        out = self.bn3(out)
        
        if self.downsample is not None:
            residual = self.downsample(x)
        
        out += residual
        out = self.relu(out)
        
        return out

DenseNet

获得 2017 CVPR 年度最佳论文
Densely Connected Convolutional Networks
https://arxiv.org/abs/1608.06993
思想上吸收了ResNet 和 Inception 结构的特点，设计了全新的网络结构。
是具有密集连接的CNN；任意两层之间都有直接的连接，就是说每层的输入都是前面所有层输入的并集；这种残差结构可以缓解梯度消失的现象。
Dense 连接仅仅是在一个 DenseBlock 中，不同的 DenseBlock 之间是没有 Dense 连接的。
DenseLayer 的 forward 方法中，最后的输出值是输入的x 和经过网络层输出的特征图cat 拼接在一起的。在 DenseBlock 中会遍历所有的layer，经过 DenseLayer 处理后，最终组合到 Sequentical 中，形成一个完整的 Block。
性能好，但是占用内存大；

网络结构

在这里插入图片描述

代码实现：

class _DenseLayer(nn.Sequential):
    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
        super(_DenseLayer, self).__init__()
        self.add_module('norm1', nn.BatchNorm2d(num_input_features)), 
        self.add_module('relu1', nn.ReLU(inplace=True)),  
        self.add_module('conv1', nn.Conv2d(num_input_features, bn_size*growth_rate, kernel_size=1, stride=1, bias=False)),
        
        self.add_module('norm2', nn.BatchNorm2d(bn_size * growth_rate)),
        self.add_module('relu2', nn.ReLU(inplace=True)),
        self.add_module('conv2', nn.Conv2d(bn_size*growth_rate, growth_rate, kernel_size=3, stride=1, padding=1, bias=False)),
        
        self.drop_rate = drop_rate
        
    def forward(self, x):
        new_features = super(_DenseLayer, self).forward(x)
        if self.drop_rate > 0:
            new_features = F.dropout(new_features, p=self.drop_rate, training=self.training )
            
        return torch.cat([x, new_features], 1)
    
    
class _DenseBlock(nn.Sequential):
    
    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
        super(_DenseBlock, self).__init__()
        for i in range(num_layers):
            layer = _DenseLayer(num_input_features + i * growth_rate, growth_rate, bn_size, drop_rate)
            self.add_module('denselayer % d' % (i + 1), layer)

RNN

每个时刻的输入，都等于当前时刻的输入和上一时刻的记忆
论文: Recurrent Neural Networks as Weighted Language Recognizers
https://arxiv.org/pdf/1711.05408.pdf

class RNNCell(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, batch_size):
        super(RNNCell, self).__init__()
        self.batch_size = batch_size
        self.hidden_size = hidden_size
        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)
        
    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(combined)
        output = self.i2o(hidden)
        output = self.softmax(output)
        
    def initHidden(self):
        return torch.zeros(self.batch_size, self.hidden_size)

bias = 20 # 时间跨度
input_dim = 1 # 输入数据的维度
lr = 0.01 
epochs = 1000
hidden_size = 64 # 隐藏层神经元个数
num_layers = 2
nonlinearity = 'relu' # 只支持 relu 和 tanh

class RNNDemo(nn.Module):
    def __init__(self, input_dim, hidden_size, num_layers, nonlinearity):
        super().__init__()
        self.input_dim = input_dim
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.nonlinearity = nonlinearity
        
        self.rnn = nn.RNN(
            input_size = input_dim,
            hidden_size = hidden_size,
            num_layers=num_layers,
            nonlinearity=nonlinearity
        )
        self.out = nn.Linear(hidden_size, 1)
        
    def forward(self, x, h):
        r_out, h_state = self.rnn(x, h)
        outs = []
        for record in range(r_out.size(1)):
            outs.append(self.out(r_out[:, record, :]))
        
        return torch.stack(outs, dim=1), h_state

# 使用 Adam 优化器，均方差误差训练模型

rnnDemo = RNNDemo(input_dim, hidden_size, num_layers, nonlinearity)
optimizer = torch.optim.Adam(rnnDemo.parameters(), lr=lr)
loss_func = nn.MSELoss()
h_state = None
for step in range(epoches):
    start, end = step * np.pi, (step + 1) * np.pi # 时间跨度
    
    # 使用三角函数 sin 预测 cos 的值
    steps = np.linspace(start, end, bins, dtype=np.float32, endpoint=False)
    x_np = np.sin(steps)
    y_np = np.cos(steps)
    
    # [100, 1, 1] 尺寸大小为 （time_step, batch, input_size）
    x = torch.from_numpy(x_np).unsqueeze(1).unsqueeze(2)
    y = torch.from_numpy(y_np).unsqueeze(1).unsqueeze(2) # [100, 1, 1]
    
    prediction, h_state = rnnDemo(x, h_state) # rnn输出（预测结果，隐藏状态）
    
    # 将每次输出的中间状态传递下去（不带梯度）
    h_state = h_state.detach()
    loss = loss_func(prediction, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (step%10) == 0:
        print('-- loss : {:.8f}'.format(loss) )
    
plt.scatter(steps, y_np, marker='^')
plt.scatter(steps, prediction.data.numpy().flatten(), marker='.')    
plt.show()

LSTM

1997 年提出
无力解决很长的上下文依赖。主要解决比较长的短时依赖，通常是10个间隔以内。
主要是三个门控单元：输入门、遗忘门（核心）、输出门。

代码实现

class LstmDemo(nn.Module):
    def __init__(self, input_dim, hidden_size, num_layers):
        super().__init__()
        self.input_dim = input_dim
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(
            input_size = input_dim,
            hidden_size=hidden_size,
            num_layers=num_layers,
        )
        self.out = nn.Linear(hidden_size, 1)
        
    def forward(self, x, h_0_c_0):
        r_out, h_state = self.lstm(x, h_0,_c_0)
        outs = []
        # r_out:(h_n, c_n)
        for record in range(r_out.size(1)):
            outs.append(self.out(r_out[:, record, :]))
        
        return torch.stack(outs, dim=1), h_state

lstmDemo = LstmDemo(input_dim, hidden_size, num_layers).cuda()
optimizer = torch.optim.Adam(lstmDemo.parameters(), lr=lr)
loss_func = nn.MSELoss()

h_c_state = (torch.zeros(num_layers, 1, hidden_size).cuda(), torch.zeros(num_layers, 1, hidden_size).cuda() )

for step in range(epoches):
    start, end = step * np.pi, (step + 1) * np.pi # 时间跨度
    steps = np.linspace(start, end, bins, dtype=np.float32, endpoint=False)
    
    x_np = np.sin(steps)
    y_np = np.cos(steps)
    
    # [100, 1, 1] 尺寸大小为 （time_step, batch, input_size）
    x = torch.from_numpy(x_np).unsqueeze(1).unsqueeze(2).cuda()
    y = orch.from_numpy(y_np).unsqueeze(1).unsqueeze(2).cuda()
    
    prediction, h_state = lstmDemo(x, h_c_state)
    
    h_c_state = (h_state[0].detach(), h_state[1].detach() )
    loss = loss_func(prediction, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if step % 100 == 0:
        print('-- loss : {:.8f}'.format(loss))
        
plt.scatter(steps, y_np, marker='^')
plt.scatter(steps, prediction.cpu().data.numpy().flatten(), marker='^')
plt.show()

GRU

与 LSTM 功能相同，简化了模型：将遗忘门和输入门合成一个单一的更新门；速度快于 LSTM。
论文: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
https://arxiv.org/pdf/1412.3555.pdf

代码实现：

class GRUDemo(nn.Module):
    def __init__(self, input_dim, hidden_size, num_layers):
        super().__init__()
        self.input_dim = input_dim
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.gru = nn.GRU(
            input_size = input_dim,
            hidden_size=hidden_size,
            num_layers=num_layers,
        )
        
        self.out = nn.Linear(hidden_size, 1)
        
    def forward(self, x, h):
        r_out, h_state = self.gru(x, h)
        outs = []
        for record in range(r_out.size[1]):
            outs.append(self.out(r_out[:, record, :]))
        return torch.stack(outs, dim=1), h_state


c_state = torch.zeros(num_layers, 1, hidden_size).cuda()

for step in range(epoches):
    start, end = step * np.pi, (step + 1) * np.pi # 时间跨度
    steps = np.linspace(start, end, bins, dtype=np.float32, endpoint=False)
    
    x_np = np.sin(steps)
    y_np = np.cos(steps)
    
    # [100, 1, 1] 尺寸大小为 （time_step, batch, input_size）
    x = torch.from_numpy(x_np).unsqueeze(1).unsqueeze(2).cuda()
    y = torch.from_numpy(y_np).unsqueeze(1).unsqueeze(2).cuda()
    
    prediction, c_state = gruDemo(x, c_state)
    
    c_state = c_state.detach() 
    loss = loss_func(prediction, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if step % 100 == 0:
        print('-- loss : {:.8f}'.format(loss))
        
plt.scatter(steps, y_np, marker='^')
plt.scatter(steps, prediction.cpu().data.numpy().flatten(), marker='^')
plt.show()