深度学习之目标检测篇——残差网络与FPN结合

news2025/2/26 7:35:41
  • 特征金字塔
  • 多尺度融合
  • 特征金字塔的网络原理
    在这里插入图片描述
  • 这里是基于resnet网络与Fpn做的结合,主要把resnet中的特征层利用FPN的思想一起结合,实现resnet_fpn。增强目标检测backone的有效性。
  • 代码实现如下:
import torch
from torch import Tensor
from collections import OrderedDict
import torch.nn.functional as F
from torch import nn
from torch.jit.annotations import Tuple, List, Dict


class Bottleneck(nn.Module):
    expansion = 4
    def __init__(self, in_channel, out_channel, stride=1, downsample=None, norm_layer=None):
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=(1,1), stride=(1,1), bias=False)  # squeeze channels
        self.bn1 = norm_layer(out_channel)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=(3,3), stride=(stride,stride), bias=False, padding=(1,1))
        self.bn2 = norm_layer(out_channel)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion,
                               kernel_size=(1,1), stride=(1,1), bias=False)  # unsqueeze channels
        self.bn3 = norm_layer(out_channel * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def  __init__(self, block, blocks_num, num_classes=1000, include_top=True, norm_layer=None):
        '''
        :param block:块
        :param blocks_num:块数
        :param num_classes: 分类数
        :param include_top:
        :param norm_layer: BN
        '''
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.include_top = include_top
        self.in_channel = 64

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.in_channel, kernel_size=(7,7), stride=(2,2),
                               padding=(3,3), bias=False)
        self.bn1 = norm_layer(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)

        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        '''
        初始化
        '''
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        norm_layer = self._norm_layer
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=(1,1), stride=(stride,stride), bias=False),
                norm_layer(channel * block.expansion))

        layers = []
        layers.append(block(self.in_channel, channel, downsample=downsample,
                            stride=stride, norm_layer=norm_layer))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel, channel, norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def forward(self, x):

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x


class IntermediateLayerGetter(nn.ModuleDict):
    """
    Module wrapper that returns intermediate layers from a models
    It has a strong assumption that the modules have been registered
    into the models in the same order as they are used.
    This means that one should **not** reuse the same nn.Module
    twice in the forward if you want this to work.
    Additionally, it is only able to query submodules that are directly
    assigned to the models. So if `models` is passed, `models.feature1` can
    be returned, but not `models.feature1.layer2`.
    Arguments:
        model (nn.Module): models on which we will extract the features
        return_layers (Dict[name, new_name]): a dict containing the names
            of the modules for which the activations will be returned as
            the key of the dict, and the value of the dict is the name
            of the returned activation (which the user can specify).
    """
    __annotations__ = {
        "return_layers": Dict[str, str],
    }

    def __init__(self, model, return_layers):

        if not set(return_layers).issubset([name for name, _ in model.named_children()]):
            raise ValueError("return_layers are not present in models")
        # {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'}
        orig_return_layers = return_layers
        return_layers = {k: v for k, v in return_layers.items()}
        layers = OrderedDict()

        # 遍历模型子模块按顺序存入有序字典
        # 只保存layer4及其之前的结构,舍去之后不用的结构
        for name, module in model.named_children():
            layers[name] = module
            if name in return_layers:
                del return_layers[name]
            if not return_layers:
                break

        super(IntermediateLayerGetter, self).__init__(layers)
        self.return_layers = orig_return_layers

    def forward(self, x):
        out = OrderedDict()
        # 依次遍历模型的所有子模块,并进行正向传播,
        # 收集layer1, layer2, layer3, layer4的输出
        for name, module in self.named_children():
            x = module(x)
            if name in self.return_layers:
                out_name = self.return_layers[name]
                out[out_name] = x
        return out


class FeaturePyramidNetwork(nn.Module):
    """
    Module that adds a FPN from on top of a set of feature maps. This is based on
    `"Feature Pyramid Network for Object Detection" <https://arxiv.org/abs/1612.03144>`_.
    The feature maps are currently supposed to be in increasing depth
    order.
    The input to the models is expected to be an OrderedDict[Tensor], containing
    the feature maps on top of which the FPN will be added.
    Arguments:
        in_channels_list (list[int]): number of channels for each feature map that
            is passed to the module
        out_channels (int): number of channels of the FPN representation
        extra_blocks (ExtraFPNBlock or None): if provided, extra operations will
            be performed. It is expected to take the fpn features, the original
            features and the names of the original features as input, and returns
            a new list of feature maps and their corresponding names
    """

    def __init__(self, in_channels_list, out_channels, extra_blocks=None):
        super(FeaturePyramidNetwork, self).__init__()
        # 用来调整resnet特征矩阵(layer1,2,3,4)的channel(kernel_size=1)
        self.inner_blocks = nn.ModuleList()
        # 对调整后的特征矩阵使用3x3的卷积核来得到对应的预测特征矩阵
        self.layer_blocks = nn.ModuleList()
        for in_channels in in_channels_list:
            if in_channels == 0:
                continue
            inner_block_module = nn.Conv2d(in_channels, out_channels, (1,1))
            layer_block_module = nn.Conv2d(out_channels, out_channels, (3,3), padding=(1,1))
            self.inner_blocks.append(inner_block_module)
            self.layer_blocks.append(layer_block_module)

        # initialize parameters now to avoid modifying the initialization of top_blocks
        for m in self.children():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight, a=1)
                nn.init.constant_(m.bias, 0)

        self.extra_blocks = extra_blocks

    def get_result_from_inner_blocks(self, x, idx):
        # type: (Tensor, int) -> Tensor
        """
        This is equivalent to self.inner_blocks[idx](x),
        but torchscript doesn't support this yet
        """
        num_blocks = len(self.inner_blocks)
        if idx < 0:
            idx += num_blocks
        i = 0
        out = x
        for module in self.inner_blocks:
            if i == idx:
                out = module(x)
            i += 1
        return out

    def get_result_from_layer_blocks(self, x, idx):
        # type: (Tensor, int) -> Tensor
        """
        This is equivalent to self.layer_blocks[idx](x),
        but torchscript doesn't support this yet
        """
        num_blocks = len(self.layer_blocks)
        if idx < 0:
            idx += num_blocks
        i = 0
        out = x
        for module in self.layer_blocks:
            if i == idx:
                out = module(x)
            i += 1
        return out

    def forward(self, x):
        # type: (Dict[str, Tensor]) -> Dict[str, Tensor]
        """
        Computes the FPN for a set of feature maps.
        Arguments:
            x (OrderedDict[Tensor]): feature maps for each feature level.
        Returns:
            results (OrderedDict[Tensor]): feature maps after FPN layers.
                They are ordered from highest resolution first.
        """
        # unpack OrderedDict into two lists for easier handling
        names = list(x.keys())
        x = list(x.values())

        # 将resnet layer4的channel调整到指定的out_channels
        # last_inner = self.inner_blocks[-1](x[-1])
        last_inner = self.get_result_from_inner_blocks(x[-1], -1)
        # result中保存着每个预测特征层
        results = []
        # 将layer4调整channel后的特征矩阵,通过3x3卷积后得到对应的预测特征矩阵
        # results.append(self.layer_blocks[-1](last_inner))
        results.append(self.get_result_from_layer_blocks(last_inner, -1))

        # 倒序遍历resenet输出特征层,以及对应inner_block和layer_block
        # layer3 -> layer2 -> layer1 (layer4已经处理过了)
        # for feature, inner_block, layer_block in zip(
        #         x[:-1][::-1], self.inner_blocks[:-1][::-1], self.layer_blocks[:-1][::-1]
        # ):
        #     if not inner_block:
        #         continue
        #     inner_lateral = inner_block(feature)
        #     feat_shape = inner_lateral.shape[-2:]
        #     inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
        #     last_inner = inner_lateral + inner_top_down
        #     results.insert(0, layer_block(last_inner))

        for idx in range(len(x) - 2, -1, -1):
            inner_lateral = self.get_result_from_inner_blocks(x[idx], idx)
            feat_shape = inner_lateral.shape[-2:]
            inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
            last_inner = inner_lateral + inner_top_down
            results.insert(0, self.get_result_from_layer_blocks(last_inner, idx))

        # 在layer4对应的预测特征层基础上生成预测特征矩阵5
        if self.extra_blocks is not None:
            results, names = self.extra_blocks(results, names)

        # make it back an OrderedDict
        out = OrderedDict([(k, v) for k, v in zip(names, results)])

        return out


class LastLevelMaxPool(torch.nn.Module):
    """
    Applies a max_pool2d on top of the last feature map
    """

    def forward(self, x, names):
        # type: (List[Tensor], List[str]) -> Tuple[List[Tensor], List[str]]
        names.append("pool")
        x.append(F.max_pool2d(x[-1], 1, 2, 0))
        return x, names


class BackboneWithFPN(nn.Module):
    """
    Adds a FPN on top of a models.
    Internally, it uses torchvision.models._utils.IntermediateLayerGetter to
    extract a submodel that returns the feature maps specified in return_layers.
    The same limitations of IntermediatLayerGetter apply here.
    Arguments:
        backbone (nn.Module)
        return_layers (Dict[name, new_name]): a dict containing the names
            of the modules for which the activations will be returned as
            the key of the dict, and the value of the dict is the name
            of the returned activation (which the user can specify).
        in_channels_list (List[int]): number of channels for each feature map
            that is returned, in the order they are present in the OrderedDict
        out_channels (int): number of channels in the FPN.
    Attributes:
        out_channels (int): the number of channels in the FPN
    """

    def __init__(self, backbone, return_layers, in_channels_list, out_channels):
        '''

        :param backbone: 特征层
        :param return_layers: 返回的层数
        :param in_channels_list: 输入通道数
        :param out_channels: 输出通道数
        '''
        super(BackboneWithFPN, self).__init__()
        '返回有序字典模型'
        self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)

        self.fpn = FeaturePyramidNetwork(
            in_channels_list=in_channels_list,
            out_channels=out_channels,
            extra_blocks=LastLevelMaxPool(),
            )
        # super(BackboneWithFPN, self).__init__(OrderedDict(
        #     [("body", body), ("fpn", fpn)]))
        self.out_channels = out_channels

    def forward(self, x):
        x = self.body(x)
        x = self.fpn(x)
        return x


def resnet50_fpn_backbone():
    # FrozenBatchNorm2d的功能与BatchNorm2d类似,但参数无法更新
    # norm_layer=misc.FrozenBatchNorm2d
    resnet_backbone = ResNet(Bottleneck, [3, 4, 6, 3],
                             include_top=False)

    # freeze layers
    # 冻结layer1及其之前的所有底层权重(基础通用特征)
    for name, parameter in resnet_backbone.named_parameters():
        if 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:
            '''
            冻结权重,不参与训练
            '''
            parameter.requires_grad_(False)
    # 字典名字
    return_layers = {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'}

    # in_channel 为layer4的输出特征矩阵channel = 2048
    in_channels_stage2 = resnet_backbone.in_channel // 8

    in_channels_list = [
        in_channels_stage2,  # layer1 out_channel=256
        in_channels_stage2 * 2,  # layer2 out_channel=512
        in_channels_stage2 * 4,  # layer3 out_channel=1024
        in_channels_stage2 * 8,  # layer4 out_channel=2048
    ]
    out_channels = 256
    return BackboneWithFPN(resnet_backbone, return_layers, in_channels_list, out_channels)


if __name__ == '__main__':
    net = resnet50_fpn_backbone()
    x = torch.randn(1,3,224,224)
    for key,value in net(x).items():
        print(key,value.shape)

  • 测试结果
    在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2264426.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

多协议视频监控汇聚/视频安防系统Liveweb搭建智慧园区视频管理平台

智慧园区作为现代化城市发展的重要组成部分&#xff0c;不仅承载着产业升级的使命&#xff0c;更是智慧城市建设的重要体现。随着产业园区竞争的逐渐白热化&#xff0c;将项目打造成完善的智慧园区是越来越多用户关注的内容。 然而我们往往在规划前期就开始面临众多难题&#…

【WPF】把DockPanel的内容生成图像

要在WPF中将一个 DockPanel 的内容生成为图像并保存&#xff0c;可以按照与之前类似的步骤进行&#xff0c;但这次我们将专注于 DockPanel 控件而不是整个窗口。 DockPanel的使用 WPF&#xff08;Windows Presentation Foundation&#xff09;中的 DockPanel 是一种布局控件&…

【电商推荐】平衡效率与效果:一种优化点击率预测的LLM融合方法

【电商推荐】平衡效率与效果&#xff1a;一种优化点击率预测的LLM融合方法 目录 文章目录 【电商推荐】平衡效率与效果&#xff1a;一种优化点击率预测的LLM融合方法目录文章摘要研究背景问题与挑战如何解决创新点算法模型多级知识蒸馏模块&#xff08;MKDM&#xff09;多级知识…

Y3编辑器教程6:触发器进阶案例

文章目录 一、地形制作1.1 地形制作流程1.2 关卡白盒1.3 场景美化1.4 优化场景 二、触发结构三、玩家指引&#xff08;函数封装&#xff09;3.1 项目拆解3.2 功能实现3.2.1 绘制UI界面3.2.2 UI的读取显示和刷新3.2.3 交互功能3.2.4 最终实现 四、NPC对话系统4.1 项目拆解4.2 UI…

2024年11月 蓝桥杯青少组 STEMA考试 Scratch真题

2024年11月 蓝桥杯青少组 STEMA考试 Scratch真题&#xff08;选择题&#xff09; 题目总数&#xff1a;5 总分数&#xff1a;50 选择题 第 1 题 单选题 Scratch运行以下程宇后&#xff0c;小兔子会&#xff08; &#xff09;。 A. 变小 B. 变大 C. 变色 D. …

ZED-OpenCV项目运行记录

项目地址&#xff1a;GitCode - 全球开发者的开源社区,开源代码托管平台 使用 ZED 立体相机与 OpenCV 进行图像处理和深度感知 • 使用 ZED 相机和 OpenCV 库捕获图像、深度图和点云。 • 提供保存并排图像、深度图和点云的功能。 • 允许在不同格式之间切换保存的深度图和点云…

Nacos 3.0 考虑升级到 Spring Boot 3 + JDK 17 了!

Nacos 由阿里开源&#xff0c;是 Spring Cloud Alibaba 中的一个重要组件&#xff0c;主要用于发现、配置和管理微服务。 由于 Spring Boot 2 的维护已于近期停止&#xff0c;Nacos 团队考虑升级到 Spring Boot 3 JDK 17&#xff0c;目前正在征求意见和建议。 这其实是一件好…

(css)鼠标移入或点击改变背景图片

(css)鼠标移入或点击改变背景图片 html <div class"mapTip"><divv-for"(item, index) of legendList":key"index"class"mapTipOne":class"{ active: change index }"click"legendHandle(item, index)"…

Grad-CAM-模型可视化分析方法

模型的可视化分析对于理解模型的行为有很好的辅助作用&#xff0c;能够让人们更容易理解神经网络的决策过程。 Grad-CAM&#xff08;Gradient-weighted Class Activation Mapping&#xff09;是一种用于解释卷积神经网络&#xff08;CNN&#xff09;决策过程的方法&#xff0c;…

常见八股文03

35.autowired、qualifier和Resource区别 Autowired&#xff1a;基于类型的注入 Qualifier&#xff1a;基于名称进行注入 Resource:按名称装配注入&#xff0c;如果找不到与名称匹配的bean&#xff0c;则按类型装配注入&#xff0c;可以用于字段和方法上 36.代理模式 动态代…

Pytorch | 利用NI-FGSM针对CIFAR10上的ResNet分类器进行对抗攻击

Pytorch | 利用NI-FGSM针对CIFAR10上的ResNet分类器进行对抗攻击 CIFAR数据集NI-FGSM介绍背景算法流程 NI-FGSM代码实现NI-FGSM算法实现攻击效果 代码汇总nifgsm.pytrain.pyadvtest.py 之前已经针对CIFAR10训练了多种分类器&#xff1a; Pytorch | 从零构建AlexNet对CIFAR10进行…

redis开发与运维-redis02-redis数据类型与命令总结

文章目录 【README】【1】redis通用命令与数据结构【1.1】通用命令【1.2】数据结构与内部编码【1.3】redis单线程架构【1.3.1】redis单线程优缺点 【2】字符串&#xff08;值的类型为字符串&#xff09;【2.1】常用命令【2.1.1】设置值【2.1.2】获取值【2.1.3】批量设置值【2.1…

机器学习《西瓜书》学习笔记《待续》

如果说&#xff0c;计算机科学是研究关于“算法”的学问&#xff0c;那么机器学习就是研究关于“学习算法”的学问。 目录 绪论引言基本术语 扩展向量的张成-span使用Markdown语法编写数学公式希腊字母的LaTex语法插入一些数学的结构插入定界符插入一些可变大小的符号插入一些函…

电脑开机提示error loading operating system怎么修复?

前一天电脑还能正常运行&#xff0c;但今天启动时却显示“Error loading operating system”&#xff08;加载操作系统错误&#xff09;。我已经仔细检查了硬盘、接线、内存、CPU和电源&#xff0c;确认这些硬件都没有问题。硬盘在其他电脑上可以正常使用&#xff0c;说明不是硬…

git企业开发的相关理论(一)

目录 一.初识git 二.git的安装 三.初始化/创建本地仓库 四.配置用户设置/配置本地仓库 五.认识工作区、暂存区、版本库 六.添加文件__场景一 七.查看 .git 文件/添加到本地仓库后.git中发生的变化 1.执行git add后的变化 index文件&#xff08;暂存区&#xff09; lo…

Linux网络——网络基础

Linux网络——网络基础 文章目录 Linux网络——网络基础一、计算机网络的发展背景1、网络的定义&#xff08;1&#xff09; 独立模式&#xff08;2&#xff09;网络互联 2、局域网 LAN3、广域网 WAN4、比较局域网和广域网5、扩展 —— 域域网和互联网 二、协议1、协议的概念2、…

react中实现导出excel文件

react中实现导出excel文件 一、安装依赖二、实现导出功能三、自定义列标题四、设置列宽度五、样式优化1、安装扩展库2、设置样式3、扩展样式功能 在 React 项目中实现点击按钮后导出数据为 Excel 文件&#xff0c;可以使用 xlsx 和 file-saver 这两个库。 一、安装依赖 在项目…

7-Zip 加密功能使用教程:如何设置密码保护压缩文件

压缩包如何加密&#xff1f;7-Zip 是一款开源的文件归档工具&#xff0c;支持多种压缩格式&#xff0c;并提供了对压缩文件进行加密的功能。使用 7-Zip 可以轻松创建和解压 .7z、.zip 等格式的压缩文件&#xff0c;并且可以通过设置密码来保护压缩包中的数据不被未授权访问。 准…

[计算机网络]ARP协议的故事:小明找小红的奇妙旅程

1.ARP小故事 在一个繁忙的网络世界中&#xff0c;每个设备都有自己的身份标识——MAC地址&#xff0c;就像每个人的身份证号码一样。在这个故事里&#xff0c;我们的主角小明&#xff08;主机&#xff09;需要找到小红&#xff08;目标主机&#xff09;的MAC地址&#xff0c;才…

新版国标GB28181设备端Android版EasyGBD支持国标GB28181-2022,支持语音对讲,支持位置上报,开源在Github

经过近3个月的迭代开发&#xff0c;新版本的国标GB28181设备端EasyGBD安卓Android版终于在昨天发布到Github了&#xff0c;最新的EasyGBD支持了国标GB28181-2022版&#xff0c;还支持了语音对讲、位置上报、本地录像等功能&#xff0c;比原有GB28181-2016版的EasyGBD更加高效、…