3.2.微调

news2025/4/26 14:18:42

微调

对于一些样本数量有限的数据集，如果使用较大的模型，可能很快过拟合，较小的模型可能效果不好。这个问题的一个解决方案是收集更多数据，但其实在很多情况下这是很难做到的。

另一种方法就是迁移学习(transfer learning)，将源数据集学到地知识迁移到目标数据集，例如，我们只想识别椅子，只有100把椅子，每把椅子的1000张不同角度的图像，尽管ImageNet数据集中大多数图像与椅子无关，但在次数据集上训练的模型可能会提取更通用的图像特征(可以理解为越底层的layer提取的特征越通用)，这有助于识别边缘、纹理、形状和对象组合，也可能有效地识别椅子。

在这里插入图片描述

1. 步骤

微调是迁移学习中的常见技巧，步骤如下：

在源数据集（例如ImageNet数据集）上预训练神经网络模型，即源模型。
创建一个新的神经网络模型，即目标模型。这将复制源模型上的所有模型设计及其参数（输出层除外）。我们假定这些模型参数包含从源数据集中学到的知识，这些知识也将适用于目标数据集。我们还假设源模型的输出层与源数据集的标签密切相关；因此不在目标模型中使用该层。
向目标模型添加输出层，其输出数是目标数据集中的类别数。然后随机初始化该层的模型参数。
在目标数据集（如椅子数据集）上训练目标模型。输出层将从头开始进行训练，而所有其他层的参数将根据源模型的参数进行微调。

在这里插入图片描述

1.1 目标模型的训练：

是一个正在目标数据集上的正常训练任务，但使用更强的正则化（参数变化不大）:

更小的学习率
更少的数据迭代

如果源数据集远复杂于目标数据，通常微调效果更好

1.2 重用分类器权重

有些时候源数据集可能也有目标数据中的部分标号，比如ImageNet里可能有椅子这一标签，那么可以使用预训练好的模型分类器中对应标号中对应的向量来做初始化(就直接copy)

1.3 固定一些层

通常而言，神经网络中低层次的特征更加通用，高层次的特征则更跟数据集相关。

那么可以固定底部一些层的参数，不参与更新，这样也能有更强的正则。

2.热狗识别

import os
import torch
import torchvision
from torch import nn
from d2l import torch as d2l
import torch_directml

device = torch_directml.device()
# @save
d2l.DATA_HUB['hotdog'] = (d2l.DATA_URL + 'hotdog.zip',
                          'fba480ffa8aa7e0febbb511d181409f899b9baa5')

data_dir = d2l.download_extract('hotdog')

train_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train'))
test_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'test'))

hotdogs = [train_imgs[i][0] for i in range(8)]
not_hotdogs = [train_imgs[-i - 1][0] for i in range(8)]
d2l.show_images(hotdogs + not_hotdogs, 2, 8, scale=1.4)
d2l.plt.show()

# 使用RGB通道的均值和标准差，以标准化每个通道 ,因为预训练的模型做了这个
normalize = torchvision.transforms.Normalize(
    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

# 先随机裁剪，并变为224 * 224 的图形，因为预训练模型输入是这个
train_augs = torchvision.transforms.Compose([
    torchvision.transforms.RandomResizedCrop(224),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    normalize])
# 将图像的高度和宽度都缩放到256像素，然后裁剪中央 224 * 224的区域来作为输入
test_augs = torchvision.transforms.Compose([
    torchvision.transforms.Resize([256, 256]),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    normalize])

# 下载模型，pretrained参数已被弃用，使用weights来获取与训练模型
pretrained_net = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1)
print(pretrained_net.fc)  # 预训练最后一层为输出层

finetune_net = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1).to(device)
finetune_net.fc = nn.Linear(finetune_net.fc.in_features, 2).to(device)
nn.init.xavier_uniform_(finetune_net.fc.weight)


def train_batch_ch13(net, X, y, loss, trainer, devices):
    """用多GPU进行小批量训练"""
    if isinstance(X, list):
        # 微调BERT中所需
        X = [x.to(devices) for x in X]
    else:
        X = X.to(devices)
    y = y.to(devices)
    net.train()
    trainer.zero_grad()
    pred = net(X)
    l = loss(pred, y)
    l.sum().backward()
    trainer.step()
    train_loss_sum = l.sum()
    train_acc_sum = d2l.accuracy(pred, y)
    return train_loss_sum, train_acc_sum


# @save 多GPU的，把参数devices改成device了，本来是个列表
def train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,
               device):
    """用多GPU进行模型训练"""
    timer, num_batches = d2l.Timer(), len(train_iter)
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1],
                            legend=['train loss', 'train acc', 'test acc'])
    # net = nn.DataParallel(net, device_ids=devices).to(devices[0])
    for epoch in range(num_epochs):
        # 4个维度：储存训练损失，训练准确度，实例数，特点数
        metric = d2l.Accumulator(4)
        for i, (features, labels) in enumerate(train_iter):
            timer.start()
            l, acc = train_batch_ch13(
                net, features, labels, loss, trainer, device)
            metric.add(l, acc, labels.shape[0], labels.numel())
            timer.stop()
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,
                             (metric[0] / metric[2], metric[1] / metric[3],
                              None))
        test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))
    print(f'loss {metric[0] / metric[2]:.3f}, train acc '
          f'{metric[1] / metric[3]:.3f}, test acc {test_acc:.3f}')
    print(f'{metric[2] * num_epochs / timer.sum():.1f} examples/sec on '
          f'{str(device)}')


# 微调
# 如果param_group=True，输出层中的模型参数将使用十倍的学习率
def train_fine_tuning(net, learning_rate, batch_size=128, num_epochs=5,
                      param_group=True):
    train_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'train'), transform=train_augs),
        batch_size=batch_size, shuffle=True)
    test_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'test'), transform=test_augs),
        batch_size=batch_size)
    # devices = d2l.try_all_gpus()
    device = torch_directml.device()
    loss = nn.CrossEntropyLoss(reduction="none")
    if param_group:
        params_1x = [param for name, param in net.named_parameters()
                     if name not in ["fc.weight", "fc.bias"]]
        # 最后一层使用10倍学习率
        trainer = torch.optim.SGD([{'params': params_1x},
                                   {'params': net.fc.parameters(),
                                    'lr': learning_rate * 10}],
                                  lr=learning_rate, weight_decay=0.001)
    else:
        trainer = torch.optim.SGD(net.parameters(), lr=learning_rate,
                                  weight_decay=0.001)

    train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,
               device)


train_fine_tuning(finetune_net, 5e-5)
d2l.plt.show()

loss 0.270, train acc 0.899, test acc 0.948
232.6 examples/sec on privateuseone:0

一次训练效果就很好了，而且后续训练很平滑，没有过拟合。

在这里插入图片描述

如果初始化为随机值：
在这里插入图片描述

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/1958007.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！

3.2.微调

微调

1. 步骤

1.1 目标模型的训练：

1.2 重用分类器权重

1.3 固定一些层

2.热狗识别

相关文章

window长时间不关机，卡顿处理方法

What Is RPC（Remote Procedure Call，远程过程调用）

面向对象程序设计(C++)模版初阶

mlp与attention的计算时间复杂度分别为多少?PAtchtst为啥patch后为啥attention计算量降低？

【时时三省】（C语言基础）循环语句while(2)

Redis：未授权访问

jumpserver web资源--远程应用发布机

独立3D网络游戏《战域重甲》开发与上架经验分享

OpenGL3.3_C++_Windows(32)

Qt Designer，仿作一个ui界面的练习（一）：界面的基本布局

Notcoin 即将空投：你需要知道什么

鸿蒙配置Version版本号，并获取其值

基于web的跨校区通勤车班次规划系统/校车管理系统

STM32项目分享：智能台灯（机智云）系统

常见的CSS属性（一）——字体、文本、边框、内边距、外边距、背景、行高、圆角、透明度、颜色值

长上下文语言模型与RAPTOR 方法

基于 SSM 的汽车租赁系统

前端在浏览器总报错，且获取请求头中token的值为null

paddle ocr 文字识别模型训练 svtr

Linux网络-ss命令