秋招面试专栏推荐 :深度学习算法工程师面试问题总结【百面算法工程师】——点击即可跳转
💡💡💡本专栏所有程序均经过测试,可成功执行💡💡💡
专栏目录 :《YOLOv8改进有效涨点》专栏介绍 & 专栏目录 | 目前已有100+篇内容,内含各种Head检测头、损失函数Loss、Backbone、Neck、NMS等创新点改进——点击即可跳转
空间注意力虽提高卷积神经网络性能,但有局限。本文介绍了感受野注意力(RFA)新机制,解决大尺寸卷积核参数共享问题。RFA关注感受野空间特征,为大型卷积核提供有效权重。RFAConv操作几乎不增加计算成本,显著提升网络性能。文章在介绍主要的原理后,将手把手教学如何进行模块的代码添加和修改,并将修改后的完整代码放在文章的最后,方便大家一键运行,小白也可轻松上手实践。以帮助您更好地学习深度学习目标检测YOLO系列的挑战。
专栏地址:YOLOv8改进——更新各种有效涨点方法——点击即可跳转
目录
1.原理
2. 将C2f_RFAConv添加到yolov8网络中
2.1 C2f_RFAConv代码实现
2.2 C2f_RFAConv的神经网络模块代码解析
2.3 更改init.py文件
2.4 添加yaml文件
2.5 注册模块
2.6 执行程序
3. 完整代码分享
4. GFLOPs
5. 进阶
6. 总结
1.原理
论文地址:RFAConv: Innovating Spatial Attention and Standard Convolutional Operation——点击即可跳转
官方代码:官方代码仓库——点击即可跳转
RFAConv(受体场注意卷积)是一种新颖的卷积运算,旨在解决标准卷积和现有空间注意机制的局限性,特别是在参数共享和大型卷积核方面。
RFAConv 背后的关键原则:
-
受体场空间特征:与专注于单个空间特征的传统空间注意不同,RFAConv 强调受体场空间特征,这些特征是根据卷积核的大小动态生成的。这种方法通过关注受体场内不同特征的重要性来增强特征提取。
-
解决参数共享问题:在标准卷积中,内核参数在整个输入中共享,限制了网络跨空间位置捕获不同信息的能力。RFAConv 通过将注意力机制与卷积相结合来解决此问题,为每个受体场创建非共享参数。
-
注意力机制集成:RFAConv 集成了一种注意力机制,该机制为接受场中的每个特征分配重要性,使网络能够专注于最重要的信息。此过程避免了 CBAM 和 CA 等传统注意力机制的局限性,这些机制在不同空间区域之间共享注意力权重。
-
高效轻量:尽管引入了注意力机制,但 RFAConv 仅增加了极少的计算开销和参数。它还使用组卷积等技术来高效提取接受场空间特征,使其适用于实时应用。
-
性能提升:通过解决空间注意力和卷积参数共享的局限性,RFAConv 增强了神经网络在分类、对象检测和分割等任务中的性能,在许多情况下优于 CBAM 和 CA 等其他基于注意力的方法。
综上所述,RFAConv 通过关注感受野空间特征进行创新,提供了一种更灵活、更强大的方法来替代标准卷积,同时保持效率并提高网络性能。
2. 将C2f_RFAConv添加到yolov8网络中
2.1 C2f_RFAConv代码实现
关键步骤一: 将下面代码粘贴到在/ultralytics/ultralytics/nn/modules/block.py中,并在该文件的__all__中添加“C2f_RFAConv”
from torch import nn
from einops import rearrange
import torch
class RFAConv(nn.Module):
def __init__(self, in_channel, out_channel, kernel_size=3, stride=1):
super().__init__()
self.kernel_size = kernel_size
self.get_weight = nn.Sequential(nn.AvgPool2d(kernel_size=kernel_size, padding=kernel_size // 2, stride=stride),
nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=1,
groups=in_channel, bias=False))
self.generate_feature = nn.Sequential(
nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=kernel_size, padding=kernel_size // 2,
stride=stride, groups=in_channel, bias=False),
nn.BatchNorm2d(in_channel * (kernel_size ** 2)),
nn.ReLU())
self.conv = Conv(in_channel, out_channel, k=kernel_size, s=kernel_size, p=0)
def forward(self, x):
b, c = x.shape[0:2]
weight = self.get_weight(x)
h, w = weight.shape[2:]
weighted = weight.view(b, c, self.kernel_size ** 2, h, w).softmax(2) # b c*kernel**2,h,w -> b c k**2 h w
feature = self.generate_feature(x).view(b, c, self.kernel_size ** 2, h,
w) # b c*kernel**2,h,w -> b c k**2 h w
weighted_data = feature * weighted
conv_data = rearrange(weighted_data, 'b c (n1 n2) h w -> b c (h n1) (w n2)', n1=self.kernel_size,
# b c k**2 h w -> b c h*k w*k
n2=self.kernel_size)
return self.conv(conv_data)
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class Bottleneck_RFAConv(nn.Module):
"""Standard bottleneck."""
def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
"""Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
expansion.
"""
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, k[0], 1)
self.cv2 = RFAConv(c_, c2)
self.add = shortcut and c1 == c2
def forward(self, x):
"""'forward()' applies the YOLO FPN to input data."""
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
class C2f_RFAConv(nn.Module):
"""Faster Implementation of CSP Bottleneck with 2 convolutions."""
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
"""Initialize CSP bottleneck layer with two convolutions with arguments ch_in, ch_out, number, shortcut, groups,
expansion.
"""
super().__init__()
self.c = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, 2 * self.c, 1, 1)
self.cv2 = Conv((2 + n) * self.c, c2, 1) # optional act=FReLU(c2)
self.m = nn.ModuleList(
Bottleneck_RFAConv(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
def forward(self, x):
"""Forward pass through C2f layer."""
x = self.cv1(x)
x = x.chunk(2, 1)
y = list(x)
# y = list(self.cv1(x).chunk(2, 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
def forward_split(self, x):
"""Forward pass using split() instead of chunk()."""
y = list(self.cv1(x).split((self.c, self.c), 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
class C3_RFAConv(C3):
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
super().__init__(c1, c2, n, shortcut, g, e)
c_ = int(c2 * e) # hidden channels
self.m = nn.Sequential(*(Bottleneck_RFAConv(c_, c_, shortcut, g, k=(1, 3), e=1.0) for _ in range(n)))
2.2 C2f_RFAConv的神经网络模块代码解析
C2f_RFAConv
是一种改进版的 CSP(Cross Stage Partial)瓶颈,目标是提升网络的特征表达能力并加速计算。该模块结合了 RFAConv(Receptive-Field Attention Convolution)和常规卷积,解决了标准卷积中参数共享的问题。
-
输入与通道分割:首先,输入通过
Conv
层处理,并被分成两个部分(通过chunk
函数)。这两个部分中的一部分会直接用于后续拼接,另一部分将进入多层 Bottleneck 模块进行处理。 -
瓶颈结构:在
C2f_RFAConv
中,每一个Bottleneck_RFAConv
层包含两个卷积操作。第一个卷积将输入通道数减半(或根据扩展参数进行调整),然后通过RFAConv
进行进一步处理。RFAConv
使用感受野注意力机制来提升不同位置特征的区分度,确保每个特征图上的信息得到充分利用。 -
特征融合:所有经过
Bottleneck_RFAConv
处理后的特征以及未处理的分支特征通过torch.cat
进行拼接。这个拼接操作通过引入更多的局部信息,增强特征表达能力。 -
最终输出:拼接后的特征通过另一个卷积操作(
cv2
)进行通道压缩,形成最终输出。
整个过程能够有效捕获局部和全局的特征信息,避免传统卷积操作中的参数共享问题,提升模型的准确性和效率。
2.3 更改init.py文件
关键步骤二:修改modules文件夹下的__init__.py文件,先导入函数
然后在下面的__all__中声明函数
2.4 添加yaml文件
关键步骤三:在/ultralytics/ultralytics/cfg/models/v8下面新建文件yolov8_C2f_RFAConv.yaml文件,粘贴下面的内容
- OD【目标检测】
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, RFAConv, [64, 3, 2]] # 0-P1/2
- [-1, 1, RFAConv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f_RFAConv, [128, True]]
- [-1, 1, RFAConv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f_RFAConv, [256, True]]
- [-1, 1, RFAConv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f_RFAConv, [512, True]]
- [-1, 1, RFAConv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f_RFAConv, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_RFAConv, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_RFAConv, [256]] # 15 (P3/8-small)
- [-1, 1, RFAConv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f_RFAConv, [512]] # 18 (P4/16-medium)
- [-1, 1, RFAConv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f_RFAConv, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)
- Seg【语义分割】
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, RFAConv, [64, 3, 2]] # 0-P1/2
- [-1, 1, RFAConv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f_RFAConv, [128, True]]
- [-1, 1, RFAConv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f_RFAConv, [256, True]]
- [-1, 1, RFAConv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f_RFAConv, [512, True]]
- [-1, 1, RFAConv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f_RFAConv, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_RFAConv, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_RFAConv, [256]] # 15 (P3/8-small)
- [-1, 1, RFAConv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f_RFAConv, [512]] # 18 (P4/16-medium)
- [-1, 1, RFAConv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f_RFAConv, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Segment, [nc, 32, 256]] # Segment(P3, P4, P5)
温馨提示:因为本文只是对yolov8基础上添加模块,如果要对yolov8n/l/m/x进行添加则只需要指定对应的depth_multiple 和 width_multiple。不明白的同学可以看这篇文章: yolov8yaml文件解读——点击即可跳转
# YOLOv8n
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.25 # layer channel multiple
max_channels: 1024 # max_channels
# YOLOv8s
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
max_channels: 1024 # max_channels
# YOLOv8l
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
max_channels: 512 # max_channels
# YOLOv8m
depth_multiple: 0.67 # model depth multiple
width_multiple: 0.75 # layer channel multiple
max_channels: 768 # max_channels
# YOLOv8x
depth_multiple: 1.33 # model depth multiple
width_multiple: 1.25 # layer channel multiple
max_channels: 512 # max_channels
2.5 注册模块
关键步骤四:在task.py的parse_model函数中注册
2.6 执行程序
在train.py中,将model的参数路径设置为yolov8_C2f_RFAConv.yaml的路径
建议大家写绝对路径,确保一定能找到
from ultralytics import YOLO
import warnings
warnings.filterwarnings('ignore')
from pathlib import Path
if __name__ == '__main__':
# 加载模型
model = YOLO("ultralytics/cfg/v8/yolov8.yaml") # 你要选择的模型yaml文件地址
# Use the model
results = model.train(data=r"你的数据集的yaml文件地址",
epochs=100, batch=16, imgsz=640, workers=4, name=Path(model.cfg).stem) # 训练模型
🚀运行程序,如果出现下面的内容则说明添加成功🚀
from n params module arguments
0 -1 1 788 ultralytics.nn.modules.block.RFAConv [3, 16, 3, 2]
1 -1 1 6400 ultralytics.nn.modules.block.RFAConv [16, 32, 3, 2]
2 -1 1 9088 ultralytics.nn.modules.block.C2f_RFAConv [32, 32, 1, True]
3 -1 1 22016 ultralytics.nn.modules.block.RFAConv [32, 64, 3, 2]
4 -1 2 56576 ultralytics.nn.modules.block.C2f_RFAConv [64, 64, 2, True]
5 -1 1 80896 ultralytics.nn.modules.block.RFAConv [64, 128, 3, 2]
6 -1 2 211456 ultralytics.nn.modules.block.C2f_RFAConv [128, 128, 2, True]
7 -1 1 309248 ultralytics.nn.modules.block.RFAConv [128, 256, 3, 2]
8 -1 1 474112 ultralytics.nn.modules.block.C2f_RFAConv [256, 256, 1, True]
9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 1 155136 ultralytics.nn.modules.block.C2f_RFAConv [384, 128, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 1 40704 ultralytics.nn.modules.block.C2f_RFAConv [192, 64, 1]
16 -1 1 43904 ultralytics.nn.modules.block.RFAConv [64, 64, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 1 130560 ultralytics.nn.modules.block.C2f_RFAConv [192, 128, 1]
19 -1 1 161536 ultralytics.nn.modules.block.RFAConv [128, 128, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 1 506880 ultralytics.nn.modules.block.C2f_RFAConv [384, 256, 1]
22 [15, 18, 21] 1 897664 ultralytics.nn.modules.head.Detect [80, [64, 128, 256]]
YOLOv8_C2f_RFAConv summary: 362 layers, 3271572 parameters, 3271556 gradients
3. 完整代码分享
https://pan.baidu.com/s/1cfDUFCeo8Xpupn_5crJLdw?pwd=88hq
提取码: 88hq
4. GFLOPs
关于GFLOPs的计算方式可以查看:百面算法工程师 | 卷积基础知识——Convolution
未改进的YOLOv8nGFLOPs
改进后的GFLOPs
手里的没有卡了,需要的同学自己测一下吧
5. 进阶
可以与其他的注意力机制或者损失函数等结合,进一步提升检测效果
6. 总结
RFAConv(Receptive-Field Attention Convolution)是一种创新的卷积操作,旨在解决标准卷积和现有空间注意力机制在参数共享和大卷积核方面的局限性。其核心思想是通过引入感受野空间特征,动态生成与卷积核大小相关的特征区域,从而增强网络对不同位置特征的捕捉能力。传统的卷积操作在不同位置共享相同的卷积核参数,无法充分利用图像各个区域的差异性信息,而RFAConv通过将注意力机制与卷积操作相结合,为每个感受野分配不同的注意力权重,从而解决了参数共享问题。同时,RFAConv注重每个感受野内部特征的重要性,通过轻量化的设计在保持高效性的同时,极大地提升了网络性能。在分类、目标检测和语义分割等任务中,RFAConv能够以极少的计算开销和参数增长,实现显著的性能提升,特别是在与CBAM和CA等传统注意力机制的对比中表现出色。因此,RFAConv为提升卷积神经网络性能提供了一种更灵活且高效的替代方案。