YOLO11改进|注意力机制篇|引入三重注意力机制Triplet Attention

news2025/7/8 20:44:25

在这里插入图片描述

- 一、【Triplet Attention】注意力机制
- - 1.1【Triplet Attention】注意力介绍
  - 1.2【Triplet Attention】核心代码
- 二、添加【Triplet Attention】注意力机制
- - 2.1STEP1
  - 2.2STEP2
  - 2.3STEP3
  - 2.4STEP4
- 三、yaml文件与运行
- - 3.1yaml文件
  - 3.2运行成功截图

一、【Triplet Attention】注意力机制

1.1【Triplet Attention】注意力介绍

在这里插入图片描述

下图是【Triplet Attention】的结构图，让我们简单分析一下运行过程和优势

处理过程：

通道池化与卷积：
左侧的分支首先对输入特征进行 Channel Pooling，通过通道维度的池化操作，压缩特征图的空间维度，随后应用一个 7×7的卷积层提取多尺度特征。
卷积后的特征图经过批归一化（Batch Normalization）和 Sigmoid 激活生成通道加权系数。
空间维度变换与卷积：
中间和右侧的分支分别进行 Permute 操作，对输入特征的维度进行转换（类似转置操作），接着应用 Z-Pooling，进一步压缩空间维度。
经过 7×7卷积和批归一化处理后，这些分支也生成了加权特征，并通过 Sigmoid 激活，生成空间加权系数。
特征融合：
三个并行路径生成的加权特征通过加法操作（+）和通道平均池化（Avg）进行融合，得到最终的权重。
最终的输出特征是经过加权操作的特征图，其中每个特征通道和空间位置都被自适应加权。
优势：
多尺度特征提取：
通过不同路径的池化和卷积操作，模块能够从多个尺度上提取特征。这种多尺度处理使得模型在不同的空间分辨率下能够学习到更丰富的上下文信息，提高了对不同目标的敏感度。
自适应加权：
模块通过通道和空间上的加权机制，使得每个通道和空间位置都根据特征的重要性被自适应调整。这使得网络能够更加关注有用的特征，抑制冗余或不相关的信息，从而提升了模型的表达能力和泛化性。
特征融合：
通过不同路径的特征融合（加法操作和平均池化），该模块能够同时整合多维度的信息，提高了特征的多样性，增强了模型处理复杂场景的能力。
有效的通道和空间交互：
通过引入通道池化和维度转置，模型实现了通道和空间维度之间的有效交互，使得网络能够同时捕捉到全局和局部的信息。

在这里插入图片描述

1.2【Triplet Attention】核心代码

import torch
import torch.nn as nn


class BasicConv(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True,
                 bn=True, bias=False):
        super(BasicConv, self).__init__()
        self.out_channels = out_planes
        self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding,
                              dilation=dilation, groups=groups, bias=bias)
        self.bn = nn.BatchNorm2d(out_planes, eps=1e-5, momentum=0.01, affine=True) if bn else None
        self.relu = nn.ReLU() if relu else None

    def forward(self, x):
        x = self.conv(x)
        if self.bn is not None:
            x = self.bn(x)
        if self.relu is not None:
            x = self.relu(x)
        return x


class ZPool(nn.Module):
    def forward(self, x):
        return torch.cat((torch.max(x, 1)[0].unsqueeze(1), torch.mean(x, 1).unsqueeze(1)), dim=1)


class AttentionGate(nn.Module):
    def __init__(self):
        super(AttentionGate, self).__init__()
        kernel_size = 7
        self.compress = ZPool()
        self.conv = BasicConv(2, 1, kernel_size, stride=1, padding=(kernel_size - 1) // 2, relu=False)

    def forward(self, x):
        x_compress = self.compress(x)
        x_out = self.conv(x_compress)
        scale = torch.sigmoid_(x_out)
        return x * scale


class TripletAttention(nn.Module):
    def __init__(self, no_spatial=False):
        super(TripletAttention, self).__init__()
        self.cw = AttentionGate()
        self.hc = AttentionGate()
        self.no_spatial = no_spatial
        if not no_spatial:
            self.hw = AttentionGate()

    def forward(self, x):
        x_perm1 = x.permute(0, 2, 1, 3).contiguous()
        x_out1 = self.cw(x_perm1)
        x_out11 = x_out1.permute(0, 2, 1, 3).contiguous()
        x_perm2 = x.permute(0, 3, 2, 1).contiguous()
        x_out2 = self.hc(x_perm2)
        x_out21 = x_out2.permute(0, 3, 2, 1).contiguous()
        if not self.no_spatial:
            x_out = self.hw(x)
            x_out = 1 / 3 * (x_out + x_out11 + x_out21)
        else:
            x_out = 1 / 2 * (x_out11 + x_out21)
        return x_out

二、添加【Triplet Attention】注意力机制

2.1STEP1

首先找到ultralytics/nn文件路径下新建一个Add-module的python文件包【这里注意一定是python文件包，新建后会自动生成_init_.py】，如果已经跟着我的教程建立过一次了可以省略此步骤，随后新建一个TripletAttention.py文件并将上文中提到的注意力机制的代码全部粘贴到此文件中，如下图所示在这里插入图片描述

2.2STEP2

在STEP1中新建的_init_.py文件中导入增加改进模块的代码包如下图所示在这里插入图片描述

2.3STEP3

找到ultralytics/nn文件夹中的task.py文件，在其中按照下图添加在这里插入图片描述

2.4STEP4

定位到ultralytics/nn文件夹中的task.py文件中的def parse_model(d, ch, verbose=True): # model_dict, input_channels(3)函数添加如图代码,【如果不好定位可以直接ctrl+f搜索定位】

在这里插入图片描述

三、yaml文件与运行

3.1yaml文件

以下是添加【Triplet Attention】注意力机制在Backbone中的yaml文件，大家可以注释自行调节，效果以自己的数据集结果为准

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128,3,2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256,3,2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512,3,2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024,3,2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1,1,TripletAttention,[]]
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 14], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 11], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)

  - [[17, 20, 23], 1, Detect, [nc]] # Detect(P3, P4, P5)