论文地址: https://arxiv.org/abs/1811.11168
源码地址:https://github.com/msracver/Deformable-ConvNets
传统的卷积操作是将特征图分成一个个与卷积核大小相同的部分,然后进行卷积操作,每部分在特征图上的位置都是固定的。这样,对于形变比较复杂的物体,使用这种卷积的效果就可能不太好了。对于这种情况,传统做法有丰富数据集,引入更多复杂形变的样本、使用各种数据增强和trick,以及人工设计一些手工特征和算法。
基于数据集和数据增强的做法都有点“暴力”,通常收敛慢而且需要较复杂的网络结构来配合;而基于手工特征算法就实在是有点“太难了”。特变是物体形变可能千变万化,这种做法本身难度就很大,而且不灵活。这时候,Deformable Conv 出道了!他站上演讲台,说他是个性boy,他会变形,不像常规卷积那样死板,他更灵活,可以应对上述提到的物体复杂形变的场景。
1.增加 DCnv2文件
添加至/models/文件中
# parameters
nc: 80 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# YOLOv5 v6.0 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, DCnv2, [128, 3, 2]], # 1-P2/4 [32,64,3,2]
[-1, 3, C3, [128]], #[64,64,1]
[-1, 1, DCnv2, [256, 3, 2]], # 3-P3/8 [64,128,3,2]
[-1, 6, C3, [256]], #[128,128,2]
[-1, 1, DCnv2, [512, 3, 2]], # 5-P4/16 #[128,256,3,2]
[-1, 9, C3, [512]], #[256,256,3]
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32 #[256,512,3,2]
[-1, 3, C3, [1024]], # 9
[-1, 1, SPPF, [1024, 5]],
]
# YOLOv5 v6.0 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
2.common.py配置
在./models/common.py文件中增加以下模板代码
from torchvision.ops import DeformConv2d # 导入模块
class DCnv2(nn.Module):
def __init__(self, c1, c2, k=3, s=1, p=1, g=1, act=True):
super(DCnv2, self).__init__()
self.conv1 = nn.Conv2d(c1, c2, kernel_size=k, stride=1, padding=p, groups=g, bias=False)
deformable_groups = 1
offset_channels = 18
mask_channels = 9
self.conv2_offset = nn.Conv2d(c2, deformable_groups * offset_channels, kernel_size=k, stride=s, padding=p)
self.conv2_mask = nn.Conv2d(c2, deformable_groups*mask_channels, kernel_size=k, stride=s, padding=p)
# init_mask = torch.Tensor(np.zeros([mask_channels, 3, 3, 3])) + np.array([0.5])
# self.conv2_mask.weight = torch.nn.Parameter(init_mask)
self.conv2 = DeformConv2d(c2, c2, kernel_size=k, stride=s, padding=1, bias=True)
self.bn1 = nn.BatchNorm2d(c2)
self.act1 = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
self.bn2 = nn.BatchNorm2d(c2)
self.act2 = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
def forward(self, x):
x = self.act1(self.bn1(self.conv1(x)))
offset = self.conv2_offset(x)
mask = torch.sigmoid(self.conv2_mask(x))
x = self.act2(self.bn2(self.conv2(x, offset=offset, mask=mask)))
return x
3.yolo.py配置
找到 models/yolo.py 文件中 parse_model() 图数的对DCnv2
类进行声明。DCnv2声明的位置如下图所示。
if m in {
Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x,DCnv2 }:
在知乎中看到一篇很不错的解析文章,源地址:https://zhuanlan.zhihu.com/p/52578771