YOLOv11改进 | 注意力篇 | YOLOv11引入GAM注意力机制

news2025/7/14 10:16:08

1.GAM介绍

摘要：为了提高各种计算机视觉任务的性能，人们研究了各种注意机制。然而，现有的方法忽略了保留通道和空间信息以增强跨维交互的重要性。因此，我们提出了一种通过减少信息减少和放大全球交互表示来提高深度神经网络性能的全球驻留机制。我们引入了具有多层单个Ceptron的3D置换用于信道注意，同时还引入了卷积空间注意子模块。对 CIFAR-100和lmageNet-1K上图像分类任务的拟议机制的评估表明我们的方法稳定地优于ResNet和轻量级的 MobileNet的几个最近的注意机制。

官方论文地址：https://ar5iv.labs.arxiv.org/html/2112.05561

官方代码地址：https://github.com/dengbuqi/GAM_Pytorch/blob/main/CAM.py

简单介绍: GAM旨在通过设计一种机制，减少信息损失并放大全局维度互动特征，从而解决传统注意力机制在通道和空间两个维度上保留信息不足的问题。GAM采用了顺序的通道-空间注意力制，并对子模块进行了重新设计。具体来说，通道注意力子模块使用3D排列来跨三个维度保留信息，并通过一个两层的MLP增强跨维度的通道-空间依赖性。在空间注意力子模块中，为了更好地关注空间信息,采用了两个卷积层进行空间信息融合，同时去除了可能导致信息减少的最大池化操作。

GAM模块结构图如下：

2.核心代码

import torch
import torch.nn as nn
 
 
class GAM(nn.Module):
    def __init__(self, in_channels, rate=4):
        super().__init__()
        out_channels = in_channels
        in_channels = int(in_channels)
        out_channels = int(out_channels)
        inchannel_rate = int(in_channels/rate)
 
 
        self.linear1 = nn.Linear(in_channels, inchannel_rate)
        self.relu = nn.ReLU(inplace=True)
        self.linear2 = nn.Linear(inchannel_rate, in_channels)
        
 
        self.conv1=nn.Conv2d(in_channels, inchannel_rate,kernel_size=7,padding=3,padding_mode='replicate')
 
        self.conv2=nn.Conv2d(inchannel_rate, out_channels,kernel_size=7,padding=3,padding_mode='replicate')
 
        self.norm1 = nn.BatchNorm2d(inchannel_rate)
        self.norm2 = nn.BatchNorm2d(out_channels)
        self.sigmoid = nn.Sigmoid()
 
    def forward(self,x):
        b, c, h, w = x.shape
        # B,C,H,W ==> B,H*W,C
        x_permute = x.permute(0, 2, 3, 1).view(b, -1, c)
        
        # B,H*W,C ==> B,H,W,C
        x_att_permute = self.linear2(self.relu(self.linear1(x_permute))).view(b, h, w, c)
 
        # B,H,W,C ==> B,C,H,W
        x_channel_att = x_att_permute.permute(0, 3, 1, 2)
 
        x = x * x_channel_att
 
        x_spatial_att = self.relu(self.norm1(self.conv1(x)))
        x_spatial_att = self.sigmoid(self.norm2(self.conv2(x_spatial_att)))
        
        out = x * x_spatial_att
 
        return out
 
if __name__ == '__main__':
    img = torch.rand(1,64,32,48)
    b, c, h, w = img.shape
    net = GAM(in_channels=c, out_channels=c)
    output = net(img)
    print(output.shape)

3.YOLOv11中添加GAM方式

3.1 在ultralytics/nn下新建Extramodule

3.2 在Extramodule里创建GAM

在GAM.py文件里添加给出的GAM代码

添加完GAM代码后，在ultralytics/nn/Extramodule/__init__.py文件中引用

3.3 在task.py里引用

在ultralytics/nn/tasks.py文件里引用Extramodule

在tasks.py找到parse_model（ctrl+f可以直接搜索parse_model位置）

添加如下代码：

        elif m in {GAM}:
            c2 = ch[f]
            args = [c2, *args]

4.新建一个yolo11GAM.yaml文件

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
  - [-1, 1, GAM, []]

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
  - [-1, 1, GAM, []]

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
  - [-1, 1, GAM, []]

  - [[16, 20, 25], 1, Detect, [nc]] # Detect(P3, P4, P5)

大家根据自己的数据集实际情况，修改nc大小。

5.模型训练

import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO(r'D:\yolo\yolov11\ultralytics-main\datasets\yolo11GAM.yaml')
    model.train(data=r'D:\yolo\yolov11\ultralytics-main\datasets\data.yaml',
                cache=False,
                imgsz=640,
                epochs=100,
                single_cls=False,  # 是否是单类别检测
                batch=8,
                close_mosaic=10,
                workers=0,
                device='0',
                optimizer='SGD',
                amp=True,
                project='runs/train',
                name='exp',
                )

模型结构打印，成功运行：