视觉注意力机制——通道注意力、空间注意力、自注意力、交叉注意力

news2024/9/20 5:43:33

计算机视觉——探索视觉注意力机制：通道、空间、自注意力及交叉注意力

在计算机视觉领域，注意力机制已经成为了提升模型性能的关键技术之一。通过模拟人类视觉注意力，模型能够更加高效地处理图像数据，关注重要的特征并忽略无关信息。本文将详细介绍几种主要的视觉注意力机制，包括通道注意力、空间注意力、自注意力和交叉注意力，并提供相应的代码示例。

通道注意力机制（Channel Attention）

通道注意力机制，如SENet中的Squeeze-and-Excitation (SE) 模块，通过强调重要的通道特征并抑制不重要的通道特征来增强模型的特征表达能力。

代码示例

import torch
import torch.nn as nn

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

# 假设输入特征图
input_feature = torch.randn(1, 64, 56, 56)
se_layer = SELayer(channel=64)
output_feature = se_layer(input_feature)

空间注意力机制（Spatial Attention）

空间注意力机制关注于图像中的重要空间位置，通常通过学习图像中每个位置的重要性权重来实现。

代码示例

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size % 2 == 1, "Kernel size must be odd."
        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=kernel_size//2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        attention = self.sigmoid(x)
        return x * attention

# 假设输入特征图
input_feature = torch.randn(1, 64, 56, 56)
spatial_attention = SpatialAttention()
output_feature = spatial_attention(input_feature)

自注意力机制（Self-Attention）

自注意力机制，如Transformer模型中的机制，允许模型在处理序列数据时考虑序列内部的长距离依赖关系。

代码示例

class SelfAttention(nn.Module):
    def __init__(self, in_dim):
        super(SelfAttention, self).__init__()
        self.query_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.key_conv = nn.Conv2d(in_dim, in_dim // 8, 1)
        self.value_conv = nn.Conv2d(in_dim, in_dim, 1)
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):
        batch_size, C, width, height = x.size()
        query = self.query_conv(x).view(batch_size, -1, width*height).permute(0, 2, 1)
        key = self.key_conv(x).view(batch_size, -1, width*height)
        energy = torch.bmm(query, key)
        attention = self.softmax(energy)
        value = self.value_conv(x).view(batch_size, -1, width*height)

        out = torch.bmm(value, attention.permute(0, 2, 1))
        out = out.view(batch_size, C, width, height)
        return out

# 假设输入特征图
input_feature = torch.randn(1, 64, 56, 56)
self_attention = SelfAttention(in_dim=64)
output_feature = self_attention(input_feature)

交叉注意力机制（Cross-Attention）

交叉注意力机制通常用于序列到序列的任务中，如机器翻译，它允许模型在生成输出序列时考虑输入序列的信息。

代码示例

class CrossAttention(nn.Module):
    def __init__(self, query_dim, key_dim):
        super(CrossAttention, self).__init__()
        self.query_conv = nn.Conv2d(query_dim, query_dim // 8, 1)
        self.key_conv = nn.Conv2d(key_dim, key_dim // 8, 1)
        self.value_conv = nn.Conv2d(key_dim, query_dim, 1)
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, query, key, value):
        batch_size, C_q, width, height = query.size()
        batch_size_k, C_k, width_k, height_k = key.size()

        query = self.query_conv(query).view(batch_size, -1, width*height).permute(0, 2, 1)
        key = self.key_conv(key).view(batch_size_k, -1, width_k*height_k)
        energy = torch.bmm(query, key)
        attention = self.softmax(energy)
        value = self.value_conv(value).view(batch_size_k, -1, width_k*height_k)

        out = torch.bmm(value, attention.permute(0, 2, 1))
        out = out.view(batch_size, -1, width, height)
        return out

# 假设输入特征图
query = torch.randn(1, 64, 56, 56)
key = torch.randn(1, 64, 56, 56)
value = torch.randn(1, 64, 56, 56)
cross_attention = CrossAttention(query_dim=64, key_dim=64)
output_feature = cross_attention(query, key, value)