文章目录
深度学习Week18——学习残差网络和ResNet50V2算法
一、前言
二、我的环境
三、论文解读
3.1 预激活设计
3.2 残差单元结构
四、模型复现
4.1 Residual Block
4.2 堆叠Residual Block
4.3. ResNet50V2架构复现
一、前言
- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊 | 接辅导、项目定制
本周由于临近期末,被各种各样事情耽误,学习效果很差,但是仍要坚持打卡,展示自己的学校成果,或许我会选择休息一周,整理一下事情,再重新学习本周内容,因此本周主要是代码复现,更深层次的学习放在未来两周,包括数据集的验证、Pytorch复现代码。
二、我的环境
- 电脑系统:Windows 10
- 语言环境:Python 3.8.0
- 编译器:Pycharm2023.2.3
深度学习环境:TensorFlow
显卡及显存:RTX 3060 8G
三、论文解读
我花了一周的时间大致阅读了何恺明大佬的论文,由于时间问题,我只能给出我的理解,可能有错误,欢迎大家指正。
1、预激活设计
ResNet
:采用传统的后激活设计,即批量归一化(Batch Normalization,简称BN)和ReLU激活函数位于卷积层之后。
ResNetV2
:引入了预激活设计,将BN和ReLU移动到卷积层之前。这种设计被称为“Pre-Activation”,它改变了信息流和梯度流,有助于优化过程。
从上图中,我们可以很明显的看出原始残差单元、批量归一化后加法、加法前ReLU、仅ReLU预激活、完全预激活,何恺明大佬进行了4种新的尝试,可以看出最好的结果是(e)full pre-activation,其次到(a)original。
对于这个抽象的概念,我咨询了Kimi.ai,让他帮我解释,我试着理解(由于时间问题,本周很多内容都是请教AI的,但AI我觉得不一定准确,还需要小心求证)
- 原始方法:每个队员跑完后,我们会给他们一个鼓励的拍手(ReLU激活函数),让他们振奋精神,然后他们把接力棒交给下一个队员,并且下一个队员在接棒前会做一些热身运动(批量归一化,BN)。
- 改变后的第一种方法:这次我们让队员跑完后先做热身运动,然后再给他们拍手鼓励。这样队员们在接力时可能会有点混乱,表现不如原来好。
- 改变后的第二种方法:我们让队员在接棒前就给他们拍手鼓励,这样他们在跑的时候可能更有动力,但可能因为热身不充分,效果一般。
- 改变后的第三种方法:我们只给队员拍手鼓励,不做热身运动。这样队员们的表现和原来差不多,但可能因为没有热身,潜力没有完全发挥出来。
- 改变后的第四种方法:我们让队员在做热身运动和拍手鼓励之后再接棒。这样他们既做好了准备,又得到了鼓励,跑得更快,表现最好。
因此我们发现,预激活可以简化信息流并提高优化的容易度
2、 残差单元结构
在咱们深度学习中,当我们增加网络的层数时,理论上网络的性能应该更好,因为有更多的数据可以用于学习复杂的特征。但实际情况是,过深的网络会变得难以训练,性能反而下降。残差单元因此诞生。
一个残差单元包Identity Path
和Residual Function
。
Identity Path就是将输入直接传递到单元的输出,不做任何处理,就像是一个"shortcut"或者“跳跃连接”。如下图,何恺明大佬提出了6种不同的shortcut在残差网络中的使用方式,以及它们是如何影响信息传递的
分别是原始,0,5倍缩放因子(减弱信息)c,d,e不理解和f应用dropout技术来随机丢弃一些信息,我觉得目的主要都是防止过拟合、增加模型效率,他们的结果如下:
最原始的(a)original 结构是最好的,也就是 identity mapping 恒等映射是最好的
四 、模型复现
(这部分代码我由于最近事情太多就直接复制粘贴了,很不好,我会尽快改正!!)
- 官方调用
tf.keras.applications.resnet_v2.ResNet50V2(
include_top=True,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation='softmax'
)
# ResNet50V2、ResNet101V2与ResNet152V2的搭建方式完全一样,区别在于堆叠Residual Block的数量不同。
import tensorflow as tf
import tensorflow.keras.layers as layers
from tensorflow.keras.models import Model
4.1 Residual Block
"""
残差块
Arguments:
x: 输入张量
filters: integer, filters of the bottleneck layer.
kernel_size: 默认是3, kernel size of the bottleneck layer.
stride: default 1, stride of the first layer.
conv_shortcut: default False, use convolution shortcut if True,
otherwise identity shortcut.
name: string, block label.
Returns:
Output tensor for the residual block.
"""
def block2(x, filters, kernel_size=3, stride=1, conv_shortcut=False, name=None):
preact = layers.BatchNormalization(name=name + '_preact_bn')(x)
preact = layers.Activation('relu', name=name + '_preact_relu')(preact)
if conv_shortcut:
shortcut = layers.Conv2D(4 * filters, 1, strides=stride, name=name + '_0_conv')(preact)
else:
shortcut = layers.MaxPooling2D(1, strides=stride)(x) if stride > 1 else x
x = layers.Conv2D(filters, 1, strides=1, use_bias=False, name=name + '_1_conv')(preact)
x = layers.BatchNormalization(name=name + '_1_bn')(x)
x = layers.Activation('relu', name=name + '_1_relu')(x)
x = layers.ZeroPadding2D(padding=((1, 1), (1, 1)), name=name + '_2_pad')(x)
x = layers.Conv2D(filters,
kernel_size,
strides=stride,
use_bias=False,
name=name + '_2_conv')(x)
x = layers.BatchNormalization(name=name + '_2_bn')(x)
x = layers.Activation('relu', name=name + '_2_relu')(x)
x = layers.Conv2D(4 * filters, 1, name=name + '_3_conv')(x)
x = layers.Add(name=name + '_out')([shortcut, x])
return x
# ResNet50
if conv_shortcut:
shortcut = layers.Conv2D(4 * filters, 1, strides=stride, name=name + '_0_conv')(x)
shortcut = layers.BatchNormalization(axis=bn_axis, epsilon=1.001e-5, name=name + '_0_bn')(shortcut)
else:
shortcut = x
# ResNet50V2 区别还是很显然的
if conv_shortcut:
shortcut = layers.Conv2D(4 * filters, 1, strides=stride, name=name + '_0_conv')(preact)
else:
# 注意后面还多了if语句
shortcut = layers.MaxPooling2D(1, strides=stride)(x) if stride > 1 else x、
2. 堆叠Residual Block
def stack2(x, filters, blocks, stride1=2, name=None):
x = block2(x, filters, conv_shortcut=True, name=name + '_block1')
for i in range(2, blocks):
x = block2(x, filters, name=name + '_block' + str(i))
x = block2(x, filters, stride=stride1, name=name + '_block' + str(blocks))
return x
3. ResNet50V2架构复现
def ResNet50V2(include_top=True, # 是否包含位于网络顶部的全连接层
preact=True, # 是否使用预激活
use_bias=True, # 是否对卷积层使用偏置
weights='imagenet',
input_tensor=None, # 可选的keras张量,用作模型的图像输入
input_shape=None,
pooling=None,
classes=1000, # 用于分类图像的可选类数
classifier_activation='softmax'): # 分类层激活函数
img_input = layers.Input(shape=input_shape)
x = layers.ZeroPadding2D(padding=((3, 3), (3, 3)), name='conv1_pad')(img_input)
x = layers.Conv2D(64, 7, strides=2, use_bias=use_bias, name='conv1_conv')(x)
if not preact:
x = layers.BatchNormalization(name='conv1_bn')(x)
x = layers.Activation('relu', name='conv1_relu')(x)
x = layers.ZeroPadding2D(padding=((1, 1), (1, 1)), name='pool1_pad')(x)
x = layers.MaxPooling2D(3, strides=2, name='pool1_pool')(x)
x = stack2(x, 64, 3, name='conv2')
x = stack2(x, 128, 4, name='conv3')
x = stack2(x, 256, 6, name='conv4')
x = stack2(x, 512, 3, stride1=1, name='conv5')
if preact:
x = layers.BatchNormalization(name='post_bn')(x)
x = layers.Activation('relu', name='post_relu')(x)
if include_top:
x = layers.GlobalAveragePooling2D(name='avg_pool')(x)
x = layers.Dense(classes, activation=classifier_activation, name='predictions')(x)
else:
if pooling == 'avg':
# GlobalAveragePooling2D就是将每张图片的每个通道值各自加起来再求平均,
# 最后结果是没有了宽高维度,只剩下个数与平均值两个维度。
# 可以理解为变成了多张单像素图片。
x = layers.GlobalAveragePooling2D(name='avg_pool')(x)
elif pooling == 'max':
x = layers.GlobalMaxPooling2D(name='max_pool')(x)
model = Model(img_input, x)
return model
if __name__ == '__main__':
model = ResNet50V2(input_shape=(224, 224, 3))
model.summary()
Model: "model"
__________________________________________________________________________________________________
conv5_block1_1_relu (Activation (None, 7, 7, 512) 0 conv5_block1_1_bn[0][0]
__________________________________________________________________________________________________
conv5_block1_2_pad (ZeroPadding (None, 9, 9, 512) 0 conv5_block1_1_relu[0][0]
__________________________________________________________________________________________________
conv5_block1_2_conv (Conv2D) (None, 7, 7, 512) 2359296 conv5_block1_2_pad[0][0]
__________________________________________________________________________________________________
conv5_block1_2_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block1_2_conv[0][0]
__________________________________________________________________________________________________
conv5_block1_2_relu (Activation (None, 7, 7, 512) 0 conv5_block1_2_bn[0][0]
__________________________________________________________________________________________________
conv5_block1_0_conv (Conv2D) (None, 7, 7, 2048) 2099200 conv5_block1_preact_relu[0][0]
__________________________________________________________________________________________________
conv5_block1_3_conv (Conv2D) (None, 7, 7, 2048) 1050624 conv5_block1_2_relu[0][0]
__________________________________________________________________________________________________
conv5_block1_out (Add) (None, 7, 7, 2048) 0 conv5_block1_0_conv[0][0]
conv5_block1_3_conv[0][0]
__________________________________________________________________________________________________
conv5_block2_preact_bn (BatchNo (None, 7, 7, 2048) 8192 conv5_block1_out[0][0]
__________________________________________________________________________________________________
conv5_block2_preact_relu (Activ (None, 7, 7, 2048) 0 conv5_block2_preact_bn[0][0]
__________________________________________________________________________________________________
conv5_block2_1_conv (Conv2D) (None, 7, 7, 512) 1048576 conv5_block2_preact_relu[0][0]
__________________________________________________________________________________________________
conv5_block2_1_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block2_1_conv[0][0]
__________________________________________________________________________________________________
conv5_block2_1_relu (Activation (None, 7, 7, 512) 0 conv5_block2_1_bn[0][0]
__________________________________________________________________________________________________
conv5_block2_2_pad (ZeroPadding (None, 9, 9, 512) 0 conv5_block2_1_relu[0][0]
__________________________________________________________________________________________________
conv5_block2_2_conv (Conv2D) (None, 7, 7, 512) 2359296 conv5_block2_2_pad[0][0]
__________________________________________________________________________________________________
conv5_block2_2_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block2_2_conv[0][0]
__________________________________________________________________________________________________
conv5_block2_2_relu (Activation (None, 7, 7, 512) 0 conv5_block2_2_bn[0][0]
__________________________________________________________________________________________________
conv5_block2_3_conv (Conv2D) (None, 7, 7, 2048) 1050624 conv5_block2_2_relu[0][0]
__________________________________________________________________________________________________
conv5_block2_out (Add) (None, 7, 7, 2048) 0 conv5_block1_out[0][0]
conv5_block2_3_conv[0][0]
__________________________________________________________________________________________________
conv5_block3_preact_bn (BatchNo (None, 7, 7, 2048) 8192 conv5_block2_out[0][0]
__________________________________________________________________________________________________
conv5_block3_preact_relu (Activ (None, 7, 7, 2048) 0 conv5_block3_preact_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_1_conv (Conv2D) (None, 7, 7, 512) 1048576 conv5_block3_preact_relu[0][0]
__________________________________________________________________________________________________
conv5_block3_1_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block3_1_conv[0][0]
__________________________________________________________________________________________________
conv5_block3_1_relu (Activation (None, 7, 7, 512) 0 conv5_block3_1_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_2_pad (ZeroPadding (None, 9, 9, 512) 0 conv5_block3_1_relu[0][0]
__________________________________________________________________________________________________
conv5_block3_2_conv (Conv2D) (None, 7, 7, 512) 2359296 conv5_block3_2_pad[0][0]
__________________________________________________________________________________________________
conv5_block3_2_bn (BatchNormali (None, 7, 7, 512) 2048 conv5_block3_2_conv[0][0]
__________________________________________________________________________________________________
conv5_block3_2_relu (Activation (None, 7, 7, 512) 0 conv5_block3_2_bn[0][0]
__________________________________________________________________________________________________
conv5_block3_3_conv (Conv2D) (None, 7, 7, 2048) 1050624 conv5_block3_2_relu[0][0]
__________________________________________________________________________________________________
conv5_block3_out (Add) (None, 7, 7, 2048) 0 conv5_block2_out[0][0]
conv5_block3_3_conv[0][0]
__________________________________________________________________________________________________
post_bn (BatchNormalization) (None, 7, 7, 2048) 8192 conv5_block3_out[0][0]
__________________________________________________________________________________________________
post_relu (Activation) (None, 7, 7, 2048) 0 post_bn[0][0]
__________________________________________________________________________________________________
avg_pool (GlobalAveragePooling2 (None, 2048) 0 post_relu[0][0]
__________________________________________________________________________________________________
predictions (Dense) (None, 1000) 2049000 avg_pool[0][0]
==================================================================================================
Total params: 25,613,800
Trainable params: 25,568,360
Non-trainable params: 45,440