本周观看了B站up主霹雳吧啦Wz的图像处理的课程,
课程链接:霹雳吧啦Wz的个人空间-霹雳吧啦Wz个人主页-哔哩哔哩视频
下面是本周的所看的课程总结。
利用GoogLeNet进行图像分类
GoogLeNet是由 Google 提出的卷积神经网络架构,于 2014 年在 ImageNet 竞赛中获得了显著的成功。
GoogLeNet 的核心创新是引入了 Inception 模块,Inception 模块通过并行处理不同的卷积核尺寸和最大池化操作,能够在一个层级中提取多尺度的特征。
GoogLeNet整体架构如下:
GoogLeNet网络中的亮点如下:
Inception的结构如下,其中在右边图片的黄色背景的1*1的卷积核的作用是为了降维,1x1 卷积在保持特征图尺寸的同时,能够减少通道数,从而降低计算复杂度。一个输入的图片,经过四种不同的变化,提取图片的特征,最后输出将通道数相加。
注:每个分支所得的特征矩阵高和宽必须相同
上述的Inception结构使用了1*1的卷积核用于降维,若不使用1*1的卷积核降维,那么如下图所示,参数会很多,计算会更慢,更难。
若使用了1*1的卷积核用于降维,那么如下图所示,将512的通道数变为24,在进行5*5的卷积核操作,所需要的参数要少很多。
其中为了缓解梯度消失的问题,GoogLeNet 在网络的中间层引入了两个辅助分类器。这些辅助分类器帮助提供额外的梯度信号,有助于训练更深的网络,如下图所示
最后在GoogLeNet网络的最后一层,使用了全局平均池化层,通过对每个特征图的所有空间位置取平均值,减少了参数数量,并降低了过拟合的风险。
在GoogLeNet中,把进行主分类器之前进行全局平均池化操作,使得高和宽都变为1的特征矩阵,减少了参数数量。
同样,GoogLeNet的模型参数相比VGGNet的参数要少很多。
代码实现
1、定义BasicConv2d类,为卷积模版,将图片进行卷积层和激活函数的输出
class BasicConv2d(nn.Module):
"""
卷积模板
"""
def __init__(self, in_channels, out_channels, **kwargs):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.relu(x)
return x
2、定义Inception类,是GoogLeNet的Inception块的输出,Inception块有4部分构成,分别是1*1的卷积核,3*3的卷积核,5*5的卷积核、池化层,它们每个的输出只是通道数目不一致,高,宽都想同,最后将这四部分的输出在通道维度上进行相加拼接
class Inception(nn.Module):
# red是reduce的缩写,指通过1*1的卷积核进行降维
def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
super(Inception, self).__init__()
self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1)
self.branch2 = nn.Sequential(
BasicConv2d(in_channels, ch3x3red, kernel_size=1),
BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1), # 保证输出大小等于输入大小
)
self.branch3 = nn.Sequential(
BasicConv2d(in_channels, ch5x5red, kernel_size=1),
BasicConv2d(ch5x5red, ch5x5, kernel_size=5, padding=2), # 保证输出大小等于输入大小
)
self.branch4 = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
BasicConv2d(in_channels, pool_proj, kernel_size=1)
)
def forward(self, x):
branch1 = self.branch1(x)
branch2 = self.branch2(x)
branch3 = self.branch3(x)
branch4 = self.branch4(x)
outputs = [branch1, branch2, branch3, branch4]
# 在维度1(通道)上进行拼接
return torch.cat(outputs, dim=1)
3、定义InceptionAux这个类,在Inception块4a和4d后会有辅助分类器,只有在训练阶段有效,缓解梯度消失问题
class InceptionAux(nn.Module):
"""
Auxiliary Classifier 辅助分类器
"""
def __init__(self, in_channels, num_classes):
super(InceptionAux, self).__init__()
# 平均池化层
self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
self.conv = BasicConv2d(in_channels, 128, kernel_size=1) # output (batch,128,4,4)
self.fc1 = nn.Linear(2048, 1024)
self.fc2 = nn.Linear(1024, num_classes)
def forward(self, x):
# aux1 N*512*14*14 aux2 N*528*14*14
x = self.averagePool(x)
# aux1 N*512*4*4 aux2 N*528*4*4
x = self.conv(x)
# N*128*4*4
x = torch.flatten(x, start_dim=1) # 从channel这个维度往后展开 N * 2048
# # 根据实际的训练结果微调 可以通过model.train和model.eval控制模型的状态,model.train时候,self.training=True
x = F.dropout(x, 0.5, training=self.training)
# N * 2048
x = F.relu(self.fc1(x), inplace=True)
x = F.dropout(x, 0.5, training=self.training)
# N * 1024
x = self.fc2(x)
return x
4、定义GoogLeNet网络
class GoogLeNet(nn.Module):
# aux_logits 是否使用辅助分类器
def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
super(GoogLeNet, self).__init__()
self.aux_logits = aux_logits
self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)
self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True) # ceil_model为True,是向下取整
self.conv2 = BasicConv2d(64, 64, kernel_size=1)
self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)
self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)
if self.aux_logits:
self.aux1 = InceptionAux(512, num_classes)
self.aux2 = InceptionAux(528, num_classes)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # 高和宽都变为1的特征矩阵
self.dropout = nn.Dropout(0.4)
self.fc = nn.Linear(1024, num_classes)
if init_weights:
self._initialize_weights()
def forward(self, x):
# N x 3 x 224 x 224
x = self.conv1(x)
# N x 64 x 112 x 112
x = self.maxpool1(x)
# N x 64 x 56 x 56
x = self.conv2(x)
# N x 64 x 56 x 56
x = self.conv3(x)
# N x 192 x 56 x 56
x = self.maxpool2(x)
# N x 192 x 28 x 28
x = self.inception3a(x)
# N x 256 x 28 x 28
x = self.inception3b(x)
# N x 480 x 28 x 28
x = self.maxpool3(x)
# N x 480 x 14 x 14
x = self.inception4a(x)
# N x 512 x 14 x 14
if self.training and self.aux_logits: # eval model lose this layer 当前模型是否处于训练模式
aux1 = self.aux1(x)
x = self.inception4b(x)
# N x 512 x 14 x 14
x = self.inception4c(x)
# N x 512 x 14 x 14
x = self.inception4d(x)
# N x 528 x 14 x 14
if self.training and self.aux_logits: # eval model lose this layer
aux2 = self.aux2(x)
x = self.inception4e(x)
# N x 832 x 14 x 14
x = self.maxpool4(x)
# N x 832 x 7 x 7
x = self.inception5a(x)
# N x 832 x 7 x 7
x = self.inception5b(x)
# N x 1024 x 7 x 7
x = self.avgpool(x)
# N x 1024 x 1 x 1
x = torch.flatten(x, 1)
# N x 1024
x = self.dropout(x)
x = self.fc(x)
# N x 1000 (num_classes)
if self.training and self.aux_logits: # eval model lose this layer
return x, aux2, aux1 # 主分类器,辅助分类器的返回
return x # 主分类器
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
5、选择GPU设备
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
6、对图片进行预处理操作,数据增强
data_transform = {
'train': torchvision.transforms.Compose([
torchvision.transforms.RandomResizedCrop(224),
torchvision.transforms.RandomHorizontalFlip(),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
]),
'val': torchvision.transforms.Compose([
torchvision.transforms.Resize((224, 224)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])
}
7、定义训练加载器,测试加载器
image_path = '../data/flower_data'
train_dataset = torchvision.datasets.ImageFolder(root=os.path.join(image_path, 'train'),
transform=data_transform['train'])
train_num = len(train_dataset) # 3306
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
val_dataset = torchvision.datasets.ImageFolder(root=os.path.join(image_path, 'val'), transform=data_transform['val'])
val_num = len(val_dataset) # 364
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)
train_steps = len(train_loader) # 104 3306/32
val_steps = len(val_loader) # 12 364/32
8、将flower数据集的种类名称转换为key为索引值,value为种类名称,并保存在json文件中
flower_list = train_dataset.class_to_idx
cla_dict = dict((val, key) for key, val in flower_list.items())
json_str = json.dumps(cla_dict, indent=4)
with open('class_indices.json', 'w') as f:
f.write(json_str)
f.close()
'''
{
"0": "daisy",
"1": "dandelion",
"2": "roses",
"3": "sunflowers",
"4": "tulips"
}
'''
9、将GoogLeNet网络实例化,并定义损失函数,优化器
net = GoogLeNet(num_classes=5, aux_logits=True, init_weights=True)
net.to(device)
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.0003)
10、对flower数据集进行训练
注:在训练过程中,需要算出主分类器和辅助分类器的损失,最后通过一定比例的相加得到最终损失
best_acc = 0.0
save_path = './GoogLeNet.pth'
epochs = 30
for epoch in range(epochs):
# train
net.train()
running_loss = 0.0
train_bar = tqdm(train_loader, file=sys.stdout)
for step, data in enumerate(train_bar):
images, labels = data
optimizer.zero_grad()
logits, aux_logits2, aux_logits1 = net(images.to(device))
loss0 = loss_function(logits, labels.to(device))
loss1 = loss_function(aux_logits1, labels.to(device))
loss2 = loss_function(aux_logits2, labels.to(device))
loss = loss0 + loss1 * 0.3 + loss2 * 0.3
loss.backward()
optimizer.step()
running_loss += loss.item()
train_bar.desc = f'train epoch[{epoch + 1}/{epochs}] loss:{loss:.3f}'
# 测试阶段不需要考虑辅助分类器,只需要考虑主分类器
net.eval()
acc = 0.0
with torch.no_grad():
val_bar = tqdm(val_loader, file=sys.stdout)
for step, val_data in enumerate(val_bar):
val_images, val_labels = val_data
outputs = net(val_images.to(device))
predict_y = torch.max(outputs, dim=1)[1]
acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
accurate = acc / val_num
print(f'[epoch {epoch + 1}] train_loss: {running_loss / train_steps:.3f} val_accuracy: {accurate:.3f}')
if accurate > best_acc:
best_acc = accurate
torch.save(net.state_dict(), save_path)
print('Finished Training')
'''
train epoch[1/30] loss:1.448: 100%|██████████| 104/104 [00:21<00:00, 4.80it/s]
100%|██████████| 12/12 [00:01<00:00, 6.36it/s]
[epoch 1] train_loss: 1.490 val_accuracy: 0.626
train epoch[2/30] loss:1.987: 100%|██████████| 104/104 [00:22<00:00, 4.68it/s]
100%|██████████| 12/12 [00:01<00:00, 6.76it/s]
[epoch 2] train_loss: 1.493 val_accuracy: 0.604
train epoch[3/30] loss:0.985: 100%|██████████| 104/104 [00:21<00:00, 4.73it/s]
100%|██████████| 12/12 [00:01<00:00, 7.11it/s]
[epoch 3] train_loss: 1.384 val_accuracy: 0.679
train epoch[4/30] loss:1.274: 100%|██████████| 104/104 [00:21<00:00, 4.80it/s]
100%|██████████| 12/12 [00:01<00:00, 7.23it/s]
[epoch 4] train_loss: 1.380 val_accuracy: 0.676
train epoch[5/30] loss:1.055: 100%|██████████| 104/104 [00:22<00:00, 4.69it/s]
100%|██████████| 12/12 [00:01<00:00, 6.72it/s]
[epoch 5] train_loss: 1.339 val_accuracy: 0.692
train epoch[6/30] loss:1.568: 100%|██████████| 104/104 [00:21<00:00, 4.83it/s]
100%|██████████| 12/12 [00:01<00:00, 7.29it/s]
[epoch 6] train_loss: 1.264 val_accuracy: 0.706
train epoch[7/30] loss:1.550: 100%|██████████| 104/104 [00:21<00:00, 4.75it/s]
100%|██████████| 12/12 [00:01<00:00, 6.59it/s]
[epoch 7] train_loss: 1.224 val_accuracy: 0.720
train epoch[8/30] loss:0.771: 100%|██████████| 104/104 [00:21<00:00, 4.82it/s]
100%|██████████| 12/12 [00:01<00:00, 7.18it/s]
[epoch 8] train_loss: 1.144 val_accuracy: 0.698
train epoch[9/30] loss:2.318: 100%|██████████| 104/104 [00:21<00:00, 4.90it/s]
100%|██████████| 12/12 [00:01<00:00, 7.16it/s]
[epoch 9] train_loss: 1.189 val_accuracy: 0.717
train epoch[10/30] loss:0.495: 100%|██████████| 104/104 [00:21<00:00, 4.73it/s]
100%|██████████| 12/12 [00:01<00:00, 6.88it/s]
[epoch 10] train_loss: 1.137 val_accuracy: 0.690
train epoch[11/30] loss:0.274: 100%|██████████| 104/104 [00:21<00:00, 4.75it/s]
100%|██████████| 12/12 [00:01<00:00, 7.16it/s]
[epoch 11] train_loss: 1.108 val_accuracy: 0.695
train epoch[12/30] loss:0.913: 100%|██████████| 104/104 [00:21<00:00, 4.79it/s]
100%|██████████| 12/12 [00:01<00:00, 6.85it/s]
[epoch 12] train_loss: 1.120 val_accuracy: 0.698
train epoch[13/30] loss:1.103: 100%|██████████| 104/104 [00:21<00:00, 4.74it/s]
100%|██████████| 12/12 [00:01<00:00, 6.95it/s]
[epoch 13] train_loss: 1.037 val_accuracy: 0.670
train epoch[14/30] loss:1.682: 100%|██████████| 104/104 [00:21<00:00, 4.84it/s]
100%|██████████| 12/12 [00:01<00:00, 7.12it/s]
[epoch 14] train_loss: 1.081 val_accuracy: 0.736
train epoch[15/30] loss:1.607: 100%|██████████| 104/104 [00:22<00:00, 4.69it/s]
100%|██████████| 12/12 [00:01<00:00, 6.90it/s]
[epoch 15] train_loss: 0.998 val_accuracy: 0.736
train epoch[16/30] loss:0.204: 100%|██████████| 104/104 [00:21<00:00, 4.74it/s]
100%|██████████| 12/12 [00:01<00:00, 6.93it/s]
[epoch 16] train_loss: 0.981 val_accuracy: 0.750
train epoch[17/30] loss:0.499: 100%|██████████| 104/104 [00:21<00:00, 4.77it/s]
100%|██████████| 12/12 [00:01<00:00, 6.72it/s]
[epoch 17] train_loss: 0.958 val_accuracy: 0.736
train epoch[18/30] loss:0.666: 100%|██████████| 104/104 [00:22<00:00, 4.66it/s]
100%|██████████| 12/12 [00:01<00:00, 7.21it/s]
[epoch 18] train_loss: 0.949 val_accuracy: 0.777
train epoch[19/30] loss:1.036: 100%|██████████| 104/104 [00:21<00:00, 4.73it/s]
100%|██████████| 12/12 [00:01<00:00, 7.26it/s]
[epoch 19] train_loss: 0.954 val_accuracy: 0.761
train epoch[20/30] loss:1.162: 100%|██████████| 104/104 [00:22<00:00, 4.70it/s]
100%|██████████| 12/12 [00:01<00:00, 6.83it/s]
[epoch 20] train_loss: 0.896 val_accuracy: 0.772
train epoch[21/30] loss:0.682: 100%|██████████| 104/104 [00:21<00:00, 4.81it/s]
100%|██████████| 12/12 [00:01<00:00, 6.87it/s]
[epoch 21] train_loss: 0.924 val_accuracy: 0.755
train epoch[22/30] loss:1.488: 100%|██████████| 104/104 [00:21<00:00, 4.76it/s]
100%|██████████| 12/12 [00:01<00:00, 6.91it/s]
[epoch 22] train_loss: 0.880 val_accuracy: 0.758
train epoch[23/30] loss:1.137: 100%|██████████| 104/104 [00:21<00:00, 4.75it/s]
100%|██████████| 12/12 [00:01<00:00, 6.99it/s]
[epoch 23] train_loss: 0.866 val_accuracy: 0.766
train epoch[24/30] loss:0.498: 100%|██████████| 104/104 [00:21<00:00, 4.82it/s]
100%|██████████| 12/12 [00:01<00:00, 6.97it/s]
[epoch 24] train_loss: 0.872 val_accuracy: 0.753
train epoch[25/30] loss:0.650: 100%|██████████| 104/104 [00:21<00:00, 4.80it/s]
100%|██████████| 12/12 [00:01<00:00, 6.75it/s]
[epoch 25] train_loss: 0.798 val_accuracy: 0.786
train epoch[26/30] loss:1.176: 100%|██████████| 104/104 [00:21<00:00, 4.83it/s]
100%|██████████| 12/12 [00:01<00:00, 7.06it/s]
[epoch 26] train_loss: 0.801 val_accuracy: 0.780
train epoch[27/30] loss:0.439: 100%|██████████| 104/104 [00:21<00:00, 4.84it/s]
100%|██████████| 12/12 [00:01<00:00, 6.96it/s]
[epoch 27] train_loss: 0.874 val_accuracy: 0.720
train epoch[28/30] loss:0.958: 100%|██████████| 104/104 [00:21<00:00, 4.76it/s]
100%|██████████| 12/12 [00:01<00:00, 7.07it/s]
[epoch 28] train_loss: 0.834 val_accuracy: 0.819
train epoch[29/30] loss:0.478: 100%|██████████| 104/104 [00:22<00:00, 4.60it/s]
100%|██████████| 12/12 [00:01<00:00, 6.50it/s]
[epoch 29] train_loss: 0.803 val_accuracy: 0.775
train epoch[30/30] loss:0.976: 100%|██████████| 104/104 [00:22<00:00, 4.60it/s]
100%|██████████| 12/12 [00:01<00:00, 6.97it/s]
[epoch 30] train_loss: 0.754 val_accuracy: 0.780
Finished Training
'''
11、模型保存成功后,进行图片的预测,对预测的图片的预处理操作
data_transform = torchvision.transforms.Compose([
torchvision.transforms.Resize((224, 224)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])
img = Image.open('../Test2_alexnet/tulip.jpg')
img = data_transform(img)
# img = img.unsqueeze(0)
img = torch.unsqueeze(img, dim=0)
try:
json_file = open('./class_indices.json','r')
class_indices = json.load(json_file)
except Exception as e:
print(e)
exit(-1)
12、将GoogLeNet模型实例化,并加载保存的模型权重参数
model = GoogLeNet(num_classes=5,aux_logits=False)
model_weight_path = './GoogLeNet.pth'
# strict为False,当前模型不导入辅助分类器参数
missing_keys,unexpected_keys = model.load_state_dict(torch.load(model_weight_path,map_location='cpu'), strict=False)
print(missing_keys)
print(unexpected_keys)
'''
[]
['aux1.conv.conv.weight', 'aux1.conv.conv.bias', 'aux1.fc1.weight', 'aux1.fc1.bias', 'aux1.fc2.weight', 'aux1.fc2.bias', 'aux2.conv.conv.weight', 'aux2.conv.conv.bias', 'aux2.fc1.weight', 'aux2.fc1.bias', 'aux2.fc2.weight', 'aux2.fc2.bias']
'''
13、进行预测
model.eval()
with torch.no_grad():
output = torch.squeeze(model(img))
predict = torch.softmax(output,dim=-1)
# predict = torch.max(predict_y,dim=1)[1]
predict_cla = torch.argmax(predict).numpy()
print(class_indices[str(predict_cla)],predict[predict_cla].item())
'''
tulips 0.9999539852142334
'''
利用ResNet进行图像分类
ResNet残差网络是一种深度神经网络架构,它通过引入残差连接,允许网络在训练过程中跳过某些层,从而缓解了深层网络中的梯度消失问题。
ResNet网络架构如下:
网络中的亮点如下:
其中网络中的Residual块为,主分支的输出矩阵与输入矩阵相加,所以图片的高,宽和通道必须相同,并非拼接。
而且,对于传统的神经网络,并不是层数越深效果越好,如下图,56层的神经网络可能要比20层的神经网络要差很多,传统的神经网络面临梯度消失或梯度爆炸,以及退化问题;而ResNet网络不会面临这些问题,ResNet随着层数的增加,模型的损失会越来越小。
而且Residual结构,使用1*1的卷积核用来降维和升维,参数的大小计算公式为输入深度*输出深度*卷积核大小,最后在相加,我们发现使用1*1的卷积核用于降维所需要的参数要少很多,如下图所示:
其中,Residual结构通过shortcut连接,输入矩阵和主分支的输出矩阵相加,输入矩阵和主分支的输出矩阵的维度也都相同;若输入矩阵和主分支的输出矩阵的维度不相同,为了保证主分支输出矩阵和输入矩阵维度相同,在其引入虚线,增加1*1的卷积核,用于升维和调整高,宽,使得主分支输出矩阵和输入矩阵的维度相同。
Batch Normalization(批量归一化)
批量归一化在图像预处理过程对图像进行标准化处理,这样可以加速网络的收敛,批量归一化使得图片的每一个维度满足均值为0,方差为1的分布,且在实际应用中通常在卷积层和relu层中间使用批量归一化。
注:是调整一批数据的分布,并不是调整一个
如下图,有两个特征两个通道,进行求出每个特征的均值和方差,注意是一批数据同一个通道所有数据的均值和方差,最后通过计算公式计算出特征矩阵。
参考链接: Batch Normalization详解以及pytorch实验_pytorch batch normalization-CSDN博客
迁移学习
使用迁移学习,可以快速的训练出一个理想的结果,并且,当数据集较小时也能训练出理想的效果。
迁移学习学习的是网络通用的一些特征和信息,如下图,前面的是通用的信息,是底层通用的能力,迁移学习将其底层的权重迁移,可以快速训练出理想的结果。
并且,常见的迁移学习方式如下:
且2,3训练会更快,1训练的结果更精确。
代码实现
ResNet的架构如下:
在代码实现之前,我们要先要了解ResNet网络block的两种形式。
对于ResNet18和ResNet34网络,它们的网络的两种形式如下:第一种形式是输入的通道数目与输出的通道数目相同;第二种形式是输入的通道数目与输出的通道数目不同,且高宽减半,通道数目翻倍。
对于ResNet50,ResNet101,ResNet152网络,它们的网络的两种形式如下:第一种形式是输入的通道数目与输出的通道数目相同;第二种形式是输入的通道数目与输出的通道数目不同,且对于resnet架构中的conv2这一层的第一个block是只改变通道数目,不改变高和宽;对于resnet架构中的其它层是既改变通道数目又改变高和宽。
1、在了解到ResNet网络block的两种形式后,我们利用代码首先实现ResNet18,ResNet34网络的block结构,代码如下:
代码中的expansion代表卷积核的变化,对于这两个网络而言,每个block最后的输出通道都与block最开始的输出通道相同,所以expansion为1;downsample表示下采样,表示在经过高宽和通道变化时调用downsample函数。
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, in_channel, out_channel, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=3, stride=stride,
padding=1, bias=False) # 不使用偏置
self.bn1 = nn.BatchNorm2d(out_channel)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=1, padding=1,
bias=False)
self.bn2 = nn.BatchNorm2d(out_channel)
self.downsample = downsample # 下采样方法
def forward(self, x):
identity = x
if self.downsample is not None: # 若下采样为None,则为实线,不需要对输入做变化
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out += identity
out = self.relu(out)
return out
2、现在我们利用代码首先实现ResNet50,ResNet101,ResNet152网络的block结构,代码如下:
代码中的expansion为4,代表每个block最后的输出通道都是block最开始的输出通道的4倍,也就是第三层卷积核个数是第一层,第二层的4倍,且比上面两个网络的不同是,首先经过1*1的卷积核进行降维,再经过3*3的卷积核提取特征,最后经过1*1的卷积核用于升维;downsample表示下采样,表示在经过高宽和通道变化时调用downsample函数。
class Bottleneck(nn.Module):
expansion = 4 # 卷积核的变化
def __init__(self, in_channel, out_channel, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=1, stride=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channel)
self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channel)
self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion, kernel_size=1,
stride=1, bias=False)
self.bn3 = nn.BatchNorm2d(out_channel)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += identity
out = self.relu(out)
return out
3、在实现ResNet的block结构后,现在实现ResNet网络的定义,代码如下:
其中代码中的include_top代表是否进行微调网络,在这里默认为True;_make_layer函数实现了ResNet网络的第二种形式的变换,里面的channel是残差结构中卷积层使用卷积核的个数,对于每个ResNet18,34和ResNet50,101,152网络,最后的输出通道数目都不相同。
并且一定要注意,对于ResNet50,101,152网络的layer1的第一个block是只改变通道,不改变高和宽,其他层的第一个block都是要改变通道和高,宽,也就是ResNet的block的第二种形式,除了第一个block,该层的其他block都是第一种形式,最后的输出通道数目与输入通道数目相同;对于ResNet18,34网络的layer1是不改变高宽,也不改变通道数目,其他层的第一个block都是通道数目翻倍,高宽减半的操作。
class ResNet(nn.Module):
def __init__(self, block, block_num, num_classes=1000, include_top=True):
super(ResNet, self).__init__()
self.include_top = include_top
self.in_channel = 64 # 通过max pool之后的通道数目
self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.in_channel, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channel)
self.relu = nn.ReLU(inplace=True)
self.maxPool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, block_num[0])
self.layer2 = self._make_layer(block, 128, block_num[1], stride=2)
self.layer3 = self._make_layer(block, 256, block_num[2], stride=2)
self.layer4 = self._make_layer(block, 512, block_num[3], stride=2)
if self.include_top:
self.avgPool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
def _make_layer(self, block, channel, block_num, stride=1):
downsample = None
if stride != 1 or self.in_channel != channel * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(channel * block.expansion),
)
layers = []
layers.append(block(self.in_channel, channel, stride, downsample))
self.in_channel = channel * block.expansion
for _ in range(1, block_num):
layers.append(block(self.in_channel, channel))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxPool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
if self.include_top:
x = self.avgPool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
4、其中定义ResNet34和ResNet101函数,返回这面定义的ResNet网络,代码如下:
其中3,4,6,3等数字代表该层的block数量;num_classes代表最后分类的数量
def ResNet34(num_classes=1000, include_top=True):
return ResNet(BasicBlock, [3, 4, 6, 3], num_classes, include_top)
def ResNet101(num_classes=1000, include_top=True):
return ResNet(Bottleneck, [3, 4, 23, 3], num_classes, include_top)
5、进行训练,前面的数据预处理,训练数据集,测试数据集与之前一样,其中ResNet34的预训练参数是在pytorch官网中下载得到的,下载链接为https://download.pytorch.org/models/resnet34-b627a593.pth,之后将预训练的参数加载到网络中,记得要改模型最后的输出的特征数量,因为ResNet默认的输出特征数量为1000
net = ResNet34()
# 加载预训练参数
model_weight_path = './resnet34-pre.pth'
missing_keys, unexpected_keys = net.load_state_dict(torch.load(model_weight_path), strict=False)
print(f'missing_keys: {missing_keys}, unexpected_keys: {unexpected_keys}')
# 改变输出的特征数量
in_channel = net.fc.in_features
net.fc = nn.Linear(in_features=in_channel, out_features=5)
net.to(device)
6、进行模型训练的代码与前一致
'''
train epoch [1/5], loss: 0.5164: 100%|██████████| 207/207 [00:33<00:00, 6.10it/s]
100%|██████████| 23/23 [00:03<00:00, 7.45it/s]
epoch: 1, train loss: 0.5003, val acc: 0.8901
save model to /kaggle/working/Resnet34.pth
train epoch [2/5], loss: 0.6781: 100%|██████████| 207/207 [00:24<00:00, 8.28it/s]
100%|██████████| 23/23 [00:02<00:00, 10.96it/s]
epoch: 2, train loss: 0.3391, val acc: 0.9258
save model to /kaggle/working/Resnet34.pth
train epoch [3/5], loss: 0.3169: 100%|██████████| 207/207 [00:24<00:00, 8.38it/s]
100%|██████████| 23/23 [00:02<00:00, 11.15it/s]
epoch: 3, train loss: 0.2870, val acc: 0.8984
train epoch [4/5], loss: 0.1521: 100%|██████████| 207/207 [00:24<00:00, 8.29it/s]
100%|██████████| 23/23 [00:01<00:00, 11.52it/s]
epoch: 4, train loss: 0.2592, val acc: 0.9203
train epoch [5/5], loss: 0.6599: 100%|██████████| 207/207 [00:24<00:00, 8.29it/s]
100%|██████████| 23/23 [00:02<00:00, 10.61it/s]
epoch: 5, train loss: 0.2376, val acc: 0.9093
Finished Training
'''
7、训练后进行预测,加载模型的参数,预测一张图片
model = ResNet34(num_classes=5)
model_weight_path = './ResNet34.pth'
model.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
model.eval()
with torch.no_grad():
output = torch.squeeze(model(img))
predict = torch.softmax(output, dim=-1)
cla_indict = torch.argmax(predict).numpy()
print(class_indict[str(cla_indict)], predict[cla_indict])
'''
tulips 0.9997768998146057
'''
利用ResNext进行图像分类
ResNeXt 是一种深度卷积神经网络架构,它在 ResNet 的基础上引入了额外的创新。
网络亮点:与ResNet相比,更新了block,并且ResNeXt 引入了分组卷积,网络的卷积层在计算时将输入特征图分成多个组,并在每个组上进行卷积操作,最后将这些卷积结果合并,这种方式可以降低计算复杂度,同时保持较好的特征表示能力。
如下图,左侧为ResNet的block架构,右侧为ResNext的block的架构。
并且ResNext101在输入尺寸为224*224的条件下,比原始的ResNet101,ResNet200的效果都要好,ResNext的网络效果得到了提升
ResNext50也要比ResNet50所需的参数会更少
- ResNext引入了分组卷积,降低了计算复杂度,减少了参数的数量,如下图,在上方是普通的卷积操作,假设输入的特征矩阵为Cin个通道,有n个卷积核,当然每个卷积核也有Cin个通道,最后的输出的特征矩阵通道数目为n,它的参数个数的计算公式为每个卷积核的高*每个卷积核的宽*输入通道数目*输出通道数目,设每个卷积核的高和宽都为k,所以普通卷积的参数个数为k*k*Cin*n。
- 而组卷积的过程如下,假设输入的特征矩阵有Cin个通道,将这Cin个通道分为g个组,其中最后的输出特征矩阵的通道数目为n,则每个组的卷积核个数为n/g,所以每个组的输出特征矩阵的参数个数为k*k*Cin/g(每个组的输入通道数目)*n/g(每个组的输出通道数目),又因为有g个组,最后组卷积的参数个数为k*k*Cin*n*1/g。
- 而只要分组的个数不为1,那么最后所需要的参数个数就一定比普通的卷积操作的要更少;同样的,若分组的数量等于输入的通道个数,输入的通道个数等于输出的通道个数,这就相当于对我们输入特征矩阵的每一个通道分配了一个通道为1的卷积核进行卷积,此时就是DW Conv。
- ResNext的block架构如下,以下a,b,c三种block架构都是等同的,最精简的block架构是c,最详细的block架构是a。
- 在这个架构中,若输入矩阵的通道数目为256,首先分为32个组,所以每个组所需的通道数目为8,每个组经过1*1的卷积核,输出通道数目为4,这也达到了降维,此时输出的通道数目为128。
- 在每个组的输出通道数目为4后,这32个组,每个组再经过3*3的卷积核,输出通道还为4。
- 最后,在每个组的输出通道数目为4后,每个组经过1*1的卷积核,输出通道为256,将每个组的输出特征矩阵相加,这等同于实现了b的concatenate的拼接操作,将此次的输出特征矩阵与输入的特征矩阵相加后,完成此次block的操作。
那为什么组卷积中组的数量要设置为32呢,关于这个问题,在文章的实验中有所提到,经过实验后,发现当组的数量设置为32时,对于ResNext50和ResNext101这两个网络,得到的损失最低。
最后,文章提到,当block中的层数小于3时,将不会起到降低复杂度,参数个数的作用,因为与普通卷积最后的结果相同,数学计算是一样的。
代码实现
ResNext引入了分组卷积,网络的卷积层在计算时将输入特征图分成多个组,并在每个组上进行卷积操作,最后将这些卷积结果合并,这种方式可以降低计算复杂度,同时保持较好的特征表示能力。
ResNext网络只能应用在ResNet50,101,152网络的基础上,因为ResNext的论文中提到,当block中的层数小于3时,将不会起到降低复杂度,参数个数的作用。
ResNet50月ResNext50的网络架构如下:
我们发现,ResNet50和ResNext50在每层的block数量上没有变化,变化的只是每一层最开始的通道数量,以及在每个block中间的分组卷积,所以,只需要在原来ResNet的代码中添加组的数量以及网络中每个分组卷积的宽度width_per_group。
其中每一层最开始的通道数量的公式为width = int(out_channel * (width_per_group / 64)) * groups,当组的数量为32,宽度为4时,最后的结果为2倍的out_channel,这个也是ResNext网络的超参数。
1、定义Bottleneck类,它代表ResNext网络的block
其中引入了组的数量和网络中每个分组卷积的宽度,若不传入其他值,那么组的数量为1,每个分组卷积的宽度为64,则和ResNet网络的效果一样,不进行分组
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, in_channel, out_channel, stride=1, downsample=None, groups=1, width_per_group=64): # 32 4
super(Bottleneck, self).__init__()
width = int(out_channel * (width_per_group / 64)) * groups # groups=32,width_per_group=4,width=out_channel*2
self.conv1 = nn.Conv2d(in_channel, width, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(width)
self.conv2 = nn.Conv2d(width, width, groups=groups, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(width)
self.conv3 = nn.Conv2d(width, out_channel * self.expansion, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += identity
out = self.relu(out)
return out
2、定义ResNext类,加入了分组数量,其他的与ResNet网络一样
class ResNeXt(nn.Module):
def __init__(self, block, blocks_num, num_classes=1000, include_top=True, groups=1, width_per_group=64):
super(ResNeXt, self).__init__()
self.include_top = include_top
self.in_channel = 64
self.groups = groups
self.width_per_group = width_per_group
self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channel)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, blocks_num[0])
self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
if self.include_top:
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
def _make_layer(self, block, channel, block_num, stride=1):
downsample = None
if stride != 1 or self.in_channel != channel * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(channel * block.expansion),
)
layers = []
layers.append(block(in_channel=self.in_channel, out_channel=channel, stride=stride, downsample=downsample,
groups=self.groups, width_per_group=self.width_per_group))
self.in_channel = channel * block.expansion
for _ in range(1, block_num):
layers.append(block(self.in_channel, channel, groups=self.groups, width_per_group=self.width_per_group))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
if self.include_top:
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
3、定义ResNext50和ResNext101网络,将分组数量和每组的卷积核个数传递给类中,将类实例化为ResNext对象,并返回
def resnext50_32x4d(num_classes=1000, include_top=True):
groups = 32
width_per_group = 4
return ResNeXt(Bottleneck, [3, 4, 6, 3], num_classes, include_top, groups, width_per_group)
def resnext101_32x8d(num_classes=1000, include_top=True):
groups = 32
width_per_group = 8
return ResNeXt(Bottleneck, [3, 4, 23, 3], num_classes, include_top, groups, width_per_group)
4、进行训练,前面的数据预处理,训练数据集,测试数据集与之前一样,不一样的是将ResNext网络实例化,并冻结除全连接层之外的所有权重
首先先下载ResNext50的预训练权重,链接:https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
net = resnext50_32x4d()
model_weight_path = './resnext50_pre.pth'
net.load_state_dict(torch.load(model_weight_path))
for param in net.parameters():
param.requires_grad = False
in_channel = net.fc.in_features
net.fc = nn.Linear(in_features=in_channel, out_features=5)
net.to(device)
5、只训练全连接层的参数
params = [p for p in net.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params, lr=0.0001)
6、进行模型训练的代码与前一致
'''
train epoch [1/10], loss: 0.3461: 100%|██████████| 207/207 [00:21<00:00, 9.76it/s]
100%|██████████| 23/23 [00:02<00:00, 8.59it/s]
epoch: 1, train loss: 0.5362, val acc: 0.8709
save model to /kaggle/working/Resnext50.pth
train epoch [2/10], loss: 0.8194: 100%|██████████| 207/207 [00:21<00:00, 9.50it/s]
100%|██████████| 23/23 [00:02<00:00, 9.08it/s]
epoch: 2, train loss: 0.5200, val acc: 0.8846
save model to /kaggle/working/Resnext50.pth
train epoch [3/10], loss: 0.5423: 100%|██████████| 207/207 [00:21<00:00, 9.67it/s]
100%|██████████| 23/23 [00:02<00:00, 9.61it/s]
epoch: 3, train loss: 0.5047, val acc: 0.8929
save model to /kaggle/working/Resnext50.pth
train epoch [4/10], loss: 0.4425: 100%|██████████| 207/207 [00:21<00:00, 9.73it/s]
100%|██████████| 23/23 [00:02<00:00, 9.40it/s]
epoch: 4, train loss: 0.4673, val acc: 0.8901
train epoch [5/10], loss: 0.9804: 100%|██████████| 207/207 [00:21<00:00, 9.51it/s]
100%|██████████| 23/23 [00:02<00:00, 9.52it/s]
epoch: 5, train loss: 0.4588, val acc: 0.9011
save model to /kaggle/working/Resnext50.pth
train epoch [6/10], loss: 0.4052: 100%|██████████| 207/207 [00:21<00:00, 9.80it/s]
100%|██████████| 23/23 [00:02<00:00, 9.24it/s]
epoch: 6, train loss: 0.4590, val acc: 0.8874
train epoch [7/10], loss: 0.3596: 100%|██████████| 207/207 [00:21<00:00, 9.66it/s]
100%|██████████| 23/23 [00:02<00:00, 9.22it/s]
epoch: 7, train loss: 0.4366, val acc: 0.9038
save model to /kaggle/working/Resnext50.pth
train epoch [8/10], loss: 0.5366: 100%|██████████| 207/207 [00:21<00:00, 9.66it/s]
100%|██████████| 23/23 [00:02<00:00, 9.57it/s]
epoch: 8, train loss: 0.4463, val acc: 0.8874
train epoch [9/10], loss: 0.6280: 100%|██████████| 207/207 [00:21<00:00, 9.49it/s]
100%|██████████| 23/23 [00:02<00:00, 9.39it/s]
epoch: 9, train loss: 0.4193, val acc: 0.9121
save model to /kaggle/working/Resnext50.pth
train epoch [10/10], loss: 0.4213: 100%|██████████| 207/207 [00:21<00:00, 9.67it/s]
100%|██████████| 23/23 [00:02<00:00, 8.85it/s]
epoch: 10, train loss: 0.4153, val acc: 0.8984
Finished Training
'''
7、训练后进行预测,加载模型的参数,预测一张图片
model = resnext50_32x4d(num_classes=5)
model_weight_path = './ResNext50.pth'
model.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
model.eval()
with torch.no_grad():
output = model(img)
output = torch.squeeze(output)
predict = torch.softmax(output,dim=-1)
idx = torch.argmax(predict,dim=-1).item()
print('img class: {}, predict class: {:.4f}'.format(class_indices[str(idx)],predict[idx]))
'''
img class: tulips, predict class: 0.9963
'''
8、若预测一批图片,需要将这些图片合成一个列表,并打包成一个batch
img_path_list = ['tulip.jpg', 'rose.jpg']
img_list = []
for img_path in img_path_list:
assert os.path.exists(img_path), f'file {img_path} does not exist'
img = Image.open(img_path)
img = data_transform(img)
img_list.append(img)
batch_img = torch.stack(img_list, dim=0)
print('batch_img shape:', batch_img.shape)
'''
batch_img shape: torch.Size([2, 3, 224, 224])
'''
9、预测一批图片时,加载模型参数与前一致,最后利用循环进行输出每次预测的图片种类和每次预测的概率
model.eval()
with torch.no_grad():
outputs = model(batch)
predict = torch.softmax(outputs, dim=-1)
idx_list = torch.argmax(predict, dim=-1).numpy()
for step, idx in enumerate(idx_list):
print('image_path: {}, image_class: {}, image_predict: {:.4f}'.format(image_path_list[step], class_indices[str(idx)],
predict[step][idx]))
'''
image_path: ../ResNext/tulip.jpg, image_class: tulips, image_predict: 0.9963
image_path: ../ResNext/rose.jpg, image_class: roses, image_predict: 0.9509
'''
MobileNet
MobileNet 是一种高效的卷积神经网络架构,专门设计用于在移动设备和边缘设备上进行高效的计算和推理。
MobileNet v1
MobileNet v1 网络的亮点如下:
MobileNet的核心是使用深度可分离卷积,传统的卷积操作是卷积核通道数目等于输入特征矩阵的通道数目,输出特征矩阵的通道等于卷积核的个数;而DW卷积将传统的卷积操作分解为更轻量的操作,它对每个通道独立的进行卷积,每个卷积核的通道数目都为1,且输入特征矩阵的通道等于卷积核的个数等于输出矩阵的通道数目,相当于传统卷积当卷积核个数为1时,不进行合并的操作。
PW卷积是在DW卷积之后,像传统卷积一样,每个卷积核有输入特征矩阵的通道数目,并且每个都是1*1的卷积核,卷积核的数目是最终的输出通道数目。
并且传统卷积最后的输出通道数目和DW+PW卷积最后的输出通道数目都是一致的,且DW+PW卷积所需的计算量会更少。
计算量的公式为:(kernel*kernel*map*map)*channel_input*channel_output。
其中kernel为卷积核的的大小,map为输出特征矩阵的大小,channel_input为输入特征矩阵的深度,channel_output为输出特征矩阵的深度。
设输入特征矩阵的高宽为DF,卷积核的大小为DK,M为输入矩阵的深度,N为输出特征矩阵的深度,也是卷积核的个数。
所以,对于普通卷积的计算量为:DK*DK*M*N*DF*DF(其中这里默认stride为1,最后不改变输出特征矩阵的大小)
对于DW+PW卷积的计算量为:
DK*DK*M*DF*DF+M*N*DF*DF(其中这里默认stride为1,最后不改变输出特征矩阵的大小)
如下图,理论上普通卷积计算量是DW+PW的8到9倍
MobileNet V1的架构如下:
其中α代表卷积核个数的倍率,β代表分辨率,也就是输入尺寸,经过实验验证,MobileNet的的精确度相对于VGG16虽然差一些,但是它所需的计算量和所需的参数数量比VGG16要少的很多。
MobileNet v2
MobileNet v2网络相比MobileNet V1网络,准确率更高,模型更小,它网络中的亮点如下:
在ResNet中引入了残差结构,它是对图片先进行降维,进行卷积后,再进行升维,属于两头大,中间小的瓶颈结构,它所用的激活函数为Relu。
对于MobileNet v2中引入了倒残差结构,它是对图片先进行升维,进行DW卷积后,再进行降维,属于两头小,中间大的结果,它所用的激活函数为Relu6。
Relu6激活函数的图像如下:对于普通的Relu激活函数,当输入值小于0时,结果为0,当输入值大于0时,不进行处理;Relu6激活函数是当输入值小于0时,结果为0,当输入值位于0和6之间,不进行处理,当输入值大于6,结果为6。
且在MobileNet v2中的每个block的最后一个1*1的卷积核所用的激活函数为Linear激活函数,因为Relu激活函数对低维特征信息造成大量的损失。
MobileNet v2的block如下:
假设输入的形状为高宽分别为h和w,输入通道数目为k,首先经过1*1的卷积核升维,输出形状为h*w*(tk),其中t为扩展因子;接下来经过3*3的卷积核进行DW卷积,DW卷积不会改变输入的通道数目,所以输出形状为h/s*w/s*(tk),其中s为步距;最后再经过1*1的卷积核用于降维,输出形状为h/s*w/s*k',其中k'为最后指定的输出通道,高宽不变。
我们要注意,当stride=1时,输入特征矩阵与输出特征矩阵的shape相同时,才有shortcut连接,这里才可以进行相交操作,若stride=2,形状大小不一致,并没有ResNet中的输入特征矩阵的变化,则没有shortcut,不能进行连接。
MobileNet v2的架构如下:
经过实验的验证,MobileNet v2相比MobileNet v1和其他网络而言,在图片分类和目标检测测试上,准确率和模型所需的参数都优于其他的网络。
个人总结
本周主要学习了一些图像处理的方法和理论,以及各种网络实现图像分类,下周将继续学习其他的一些模型算法和理论知识,并且阅读相应的文献,理论与实践相结合。