特征图可视化有两类方法,一类是直接将某一层的feature map映射到0-255的范围,变成图像。另一类是使用一个预训练的反卷积网络(反卷积、反池化)将feature map变成图像,从而达到可视化feature map的目的。



(3)类别激活可视化(Class Activation Mapping,CAM)

CAM(Class Activation Mapping,类别激活映射图),亦称为类别热力图或显著性图。它的大小与原图一致,像素值表示原始图片的对应区域对预测输出的影响程度,值越大贡献越大。目前常用的CAM系列包括:CAM、Grad-CAM、Grad-CAM++。





from torch.utils.tensorboard import SummaryWriter



1. 最通用直接的特征可视化方法



importtorchfromtorchvisionimportmodels,transformsfromPILimportImageimportmatplotlib.pyplotaspltimportnumpyasnpimportscipy.misc# 导入数据defget_image_info(image_dir):# 以RGB格式打开图像# Pytorch DataLoader就是使用PIL所读取的图像格式# 建议就用这种方法读取图像,当读入灰度图像时convert('')image_info=Image.open(image_dir).convert('RGB')# 数据预处理方法image_transform=transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])])image_info=image_transform(image_info)image_info=image_info.unsqueeze(0)returnimage_info# 获取第k层的特征图defget_k_layer_feature_map(feature_extractor,k,x):withtorch.no_grad():forindex,layerinenumerate(feature_extractor):x=layer(x)ifk==index:returnx#  可视化特征图defshow_feature_map(feature_map):feature_map=feature_map.squeeze(0)feature_map=feature_map.cpu().numpy()feature_map_num=feature_map.shape[0]row_num=np.ceil(np.sqrt(feature_map_num))plt.figure()forindexinrange(1,feature_map_num+1):plt.subplot(row_num,row_num,index)plt.imshow(feature_map[index-1],cmap='gray')plt.axis('off')scipy.misc.imsave(str(index)+".png",feature_map[index-1])plt.show()if__name__=='__main__':# 初始化图像的路径image_dir=r"husky.png"# 定义提取第几层的feature mapk=1# 导入Pytorch封装的AlexNet网络模型model=models.alexnet(pretrained=True)# 是否使用gpu运算use_gpu=torch.cuda.is_available()use_gpu=False# 读取图像信息image_info=get_image_info(image_dir)# 判断是否使用gpuifuse_gpu:model=model.cuda()image_info=image_info.cuda()# alexnet只有features部分有特征图# classifier部分的feature map是向量feature_extractor=model.featuresfeature_map=get_k_layer_feature_map(feature_extractor,k,image_info)show_feature_map(feature_map)

2. 反卷积可视化特征


如下图所示,反卷积网络的用途是对一个训练好的神经网络中任意一层feature map经过反卷积网络后重构出像素空间,主要操作是反池化unpooling、修正rectify、滤波filter,换句话说就是反池化,反激活,反卷积。


更多实现细节可参考文章《Visualizing and Understanding Convolutional Networks》

还有改进版的导向反向传播《Striving for Simplicity:The All Convolutional Net》




具体思路:从一张带有随机噪声的图像开始,每个像素值随机选取一种颜色。接下来,我们使用这张噪声图作为CNN网络的输入向前传播,然后取得其在网络中第 i 层 j 个卷积核的激活 a_ij(x),然后做一个反向传播计算的梯度 G=∂F/∂I,目标是希望通过改变每个像素的颜色值以增加对该卷积核的激活,用梯度上升的方法迭代更新图像 I=I+η∗G,η是类似于学习率的东西。

import numpy as np
import tensorflow as tf
from tensorflow import keras
# The dimensions of our input image
img_width = 180
img_height = 180
# Our target layer: we will visualize the filters from this layer.
# See `model.summary()` for list of layer names, if you want to change this.
layer_name = "conv3_block4_out"

# Build a ResNet50V2 model loaded with pre-trained ImageNet weights
model = keras.applications.ResNet50V2(weights="imagenet", include_top=False)
# Set up a model that returns the activation values for our target layerlayer = model.get_layer(name=layer_name)
feature_extractor = keras.Model(inputs=model.inputs, outputs=layer.output)
# loss函数取最大化指定卷积核的响应值的平均值
def compute_loss(input_image, filter_index):
    activation = feature_extractor(input_image)
    # We avoid border artifacts by only involving non-border pixels in the loss.
    filter_activation = activation[:, 2:-2, 2:-2, filter_index]
    return tf.reduce_mean(filter_activation)
def gradient_ascent_step(img, filter_index, learning_rate):
    with tf.GradientTape() as tape:
        loss = compute_loss(img, filter_index)
    # Compute gradients.
    grads = tape.gradient(loss, img)
    # Normalize gradients.
    grads = tf.math.l2_normalize(grads)
    img += learning_rate * grads
    return loss, img

def initialize_image():
    # We start from a gray image with some random noise
    img = tf.random.uniform((1, img_width, img_height, 3))
    # ResNet50V2 expects inputs in the range [-1, +1].
    # Here we scale our random inputs to [-0.125, +0.125]
    return (img - 0.5) * 0.25

def visualize_filter(filter_index):
    # We run gradient ascent for 20 steps
    iterations = 30
    learning_rate = 10.0
    img = initialize_image()
    for iteration in range(iterations):
        loss, img = gradient_ascent_step(img, filter_index, learning_rate)

    # Decode the resulting input image
    img = deprocess_image(img[0].numpy())
    return loss, img

def deprocess_image(img):
    # Normalize array: center on 0., ensure variance is 0.15
    img -= img.mean()
    img /= img.std() + 1e-5
    img *= 0.15

    # Center crop
    img = img[25:-25, 25:-25, :]

    # Clip to [0, 1]
    img += 0.5
    img = np.clip(img, 0, 1)

    # Convert to RGB array
    img *= 255
    img = np.clip(img, 0, 255).astype("uint8")
    return img



How convolutional neural networks see the world



CAM全称Class Activation Mapping,既类别激活映射图,也被称为类别热力图、显著性图等。是一张和原始图片等同大小图,该图片上每个位置的像素取值范围从0到1,一般用0到255的灰度图表示。可以理解为对预测输出的贡献分布,分数越高的地方表示原始图片对应区域对网络的响应越高、贡献越大。常用的CAM方法有:

1. CAM


如上图所示,CAM的结构由CNN特征提取网络,全局平均池化GAP,全连接层和Softmax组成。一张图片在经过CNN特征提取网络后得到feature maps, 再对每一个feature map进行全局平均池化,变成一维向量,再经过全连接层与softmax得到类的概率。假定在GAP前是n个通道,则经过GAP后得到的是一个长度为1x n的向量,假定类别数为m,则全连接层的权值为一个n x m的张量。(注:这里先忽视batch-size)。对于某一个类别C, 现在想要可视化这个模型对于识别类别C,原图像的哪些区域起主要作用,换句话说模型是根据哪些信息得到该图像就是类别C。

做法是取出全连接层中得到类别C的概率的那一维权值,用W表示,即上图的下半部分。然后对GAP前的feature map进行加权求和,由于此时feature map不是原图像大小,在加权求和后还需要进行上采样,即可得到Class Activation Map。

用公式表示如下:(k表示通道,c表示类别,fk(x,y)表示feature map)


class CAM(_CAM):
    """Implements a class activation map extractor as described in `"Learning Deep Features for Discriminative
    Localization" <https://arxiv.org/pdf/1512.04150.pdf>`_.
    The Class Activation Map (CAM) is defined for image classification models that have global pooling at the end
    of the visual feature extraction block. The localization map is computed as follows:
    .. math::
        L^{(c)}_{CAM}(x, y) = ReLU\\Big(\\sum\\limits_k w_k^{(c)} A_k(x, y)\\Big)
    where :math:`A_k(x, y)` is the activation of node :math:`k` in the target layer of the model at
    position :math:`(x, y)`,
    and :math:`w_k^{(c)}` is the weight corresponding to class :math:`c` for unit :math:`k` in the fully
    connected layer..
        >>> from torchvision.models import resnet18
        >>> from torchcam.cams import CAM
        >>> model = resnet18(pretrained=True).eval()
        >>> cam = CAM(model, 'layer4', 'fc')
        >>> with torch.no_grad(): out = model(input_tensor)
        >>> cam(class_idx=100)
        model: input model
        target_layer: name of the target layer
        fc_layer: name of the fully convolutional layer
        input_shape: shape of the expected input tensor excluding the batch dimension

    def __init__(
        model: nn.Module,
        target_layer: Optional[str] = None,
        fc_layer: Optional[str] = None,
        input_shape: Tuple[int, ...] = (3, 224, 224),
        **kwargs: Any,
    ) -> None:

        super().__init__(model, target_layer, input_shape, **kwargs)

        # If the layer is not specified, try automatic resolution
        if fc_layer is None:
            fc_layer = locate_linear_layer(model)
            # Warn the user of the choice
            if isinstance(fc_layer, str):
                logging.warning(f"no value was provided for `fc_layer`, thus set to '{fc_layer}'.")
                raise ValueError("unable to resolve `fc_layer` automatically, please specify its value.")
        # Softmax weight
        self._fc_weights = self.submodule_dict[fc_layer].weight.data
        # squeeze to accomodate replacement by Conv1x1
        if self._fc_weights.ndim > 2:
            self._fc_weights = self._fc_weights.view(*self._fc_weights.shape[:2])

    def _get_weights(self, class_idx: int, scores: Optional[Tensor] = None) -> Tensor:
        """Computes the weight coefficients of the hooked activation maps"""

        # Take the FC weights of the target class
        return self._fc_weights[class_idx, :]

2. Grad-CAM

利用 GAP 获取 CAM 的方式有它的局限性:

1)要求模型必须有 GAP 层;


Grad-CAM 是为了克服上面的缺陷而提出的,Grad-CAM的最大特点就是不再需要修改现有的模型结构了,也不需要重新训练了,直接在原模型上即可可视化,可提取任意层的热力图。



class _GradCAM(_CAM):
    """Implements a gradient-based class activation map extractor
        model: input model
        target_layer: name of the target layer
        input_shape: shape of the expected input tensor excluding the batch dimension

    def __init__(
        model: torch.nn.Module,
        target_layer: Optional[str] = None,
        input_shape: Tuple[int, ...] = (3, 224, 224),
        **kwargs: Any,
    ) -> None:

        super().__init__(model, target_layer, input_shape, **kwargs)
        # Init hook
        self.hook_g: Optional[Tensor] = None
        # Ensure ReLU is applied before normalization
        self._relu = True
        # Model output is used by the extractor
        self._score_used = True
        # Trick to avoid issues with inplace operations cf. https://github.com/pytorch/pytorch/issues/61519

    def _store_grad(self, grad: Tensor) -> None:
        if self._hooks_enabled:
            self.hook_g = grad.data

    def _hook_g(self, module: torch.nn.Module, input: Tensor, output: Tensor) -> None:
        """Gradient hook"""
        if self._hooks_enabled:

    def _backprop(self, scores: Tensor, class_idx: int) -> None:
        """Backpropagate the loss for a specific output class"""

        if self.hook_a is None:
            raise TypeError("Inputs need to be forwarded in the model for the conv features to be hooked")

        # Backpropagate to get the gradients on the hooked layer
        loss = scores[:, class_idx].sum()

    def _get_weights(self, class_idx, scores):

        raise NotImplementedError

class GradCAM(_GradCAM):
    """Implements a class activation map extractor as described in `"Grad-CAM: Visual Explanations from Deep Networks
    via Gradient-based Localization" <https://arxiv.org/pdf/1610.02391.pdf>`_.
    The localization map is computed as follows:
    .. math::
        L^{(c)}_{Grad-CAM}(x, y) = ReLU\\Big(\\sum\\limits_k w_k^{(c)} A_k(x, y)\\Big)
    with the coefficient :math:`w_k^{(c)}` being defined as:
    .. math::
        w_k^{(c)} = \\frac{1}{H \\cdot W} \\sum\\limits_{i=1}^H \\sum\\limits_{j=1}^W
        \\frac{\\partial Y^{(c)}}{\\partial A_k(i, j)}
    where :math:`A_k(x, y)` is the activation of node :math:`k` in the target layer of the model at
    position :math:`(x, y)`,
    and :math:`Y^{(c)}` is the model output score for class :math:`c` before softmax.
        >>> from torchvision.models import resnet18
        >>> from torchcam.cams import GradCAM
        >>> model = resnet18(pretrained=True).eval()
        >>> cam = GradCAM(model, 'layer4')
        >>> scores = model(input_tensor)
        >>> cam(class_idx=100, scores=scores)
        model: input model
        target_layer: name of the target layer
        input_shape: shape of the expected input tensor excluding the batch dimension

    def _get_weights(self, class_idx: int, scores: Tensor) -> Tensor:  # type: ignore[override]
        """Computes the weight coefficients of the hooked activation maps"""

        self.hook_g: Tensor
        # Backpropagate
        self._backprop(scores, class_idx)
        # Global average pool the gradients over spatial dimensions
        return self.hook_g.squeeze(0).flatten(1).mean(-1)

3. Grad-CAM++

Grad-CAM是利用目标特征图的梯度求平均(GAP)获取特征图权重,可以看做梯度map上每一个元素的贡献是一样。为了得到更好的效果(特别是在某一分类的物体在图像中不止一个的情况下),Chattopadhyay等认为梯度map上的每一个元素的贡献不同,又进一步提出了Grad-CAM++,主要的变动是在对应于某个分类的特征映射的权重表示中加入了ReLU和权重梯度 :




class GradCAMpp(_GradCAM):
    """Implements a class activation map extractor as described in `"Grad-CAM++: Improved Visual Explanations for
    Deep Convolutional Networks" <https://arxiv.org/pdf/1710.11063.pdf>`_.
    The localization map is computed as follows:
    .. math::
        L^{(c)}_{Grad-CAM++}(x, y) = \\sum\\limits_k w_k^{(c)} A_k(x, y)
    with the coefficient :math:`w_k^{(c)}` being defined as:
    .. math::
        w_k^{(c)} = \\sum\\limits_{i=1}^H \\sum\\limits_{j=1}^W \\alpha_k^{(c)}(i, j) \\cdot
        ReLU\\Big(\\frac{\\partial Y^{(c)}}{\\partial A_k(i, j)}\\Big)
    where :math:`A_k(x, y)` is the activation of node :math:`k` in the target layer of the model at
    position :math:`(x, y)`,
    :math:`Y^{(c)}` is the model output score for class :math:`c` before softmax,
    and :math:`\\alpha_k^{(c)}(i, j)` being defined as:
    .. math::
        \\alpha_k^{(c)}(i, j) = \\frac{1}{\\sum\\limits_{i, j} \\frac{\\partial Y^{(c)}}{\\partial A_k(i, j)}}
        = \\frac{\\frac{\\partial^2 Y^{(c)}}{(\\partial A_k(i,j))^2}}{2 \\cdot
        \\frac{\\partial^2 Y^{(c)}}{(\\partial A_k(i,j))^2} + \\sum\\limits_{a,b} A_k (a,b) \\cdot
        \\frac{\\partial^3 Y^{(c)}}{(\\partial A_k(i,j))^3}}
    if :math:`\\frac{\\partial Y^{(c)}}{\\partial A_k(i, j)} = 1` else :math:`0`.
        >>> from torchvision.models import resnet18
        >>> from torchcam.cams import GradCAMpp
        >>> model = resnet18(pretrained=True).eval()
        >>> cam = GradCAMpp(model, 'layer4')
        >>> scores = model(input_tensor)
        >>> cam(class_idx=100, scores=scores)
        model: input model
        target_layer: name of the target layer
        input_shape: shape of the expected input tensor excluding the batch dimension

    def _get_weights(self, class_idx: int, scores: Tensor) -> Tensor:  # type: ignore[override]
        """Computes the weight coefficients of the hooked activation maps"""

        self.hook_g: Tensor
        # Backpropagate
        self._backprop(scores, class_idx)
        # Alpha coefficient for each pixel
        grad_2 = self.hook_g.pow(2)
        grad_3 = grad_2 * self.hook_g
        # Watch out for NaNs produced by underflow
        spatial_dims = self.hook_a.ndim - 2  # type: ignore[union-attr]
        denom = 2 * grad_2 + (grad_3 * self.hook_a).flatten(2).sum(-1)[(...,) + (None,) * spatial_dims]
        nan_mask = grad_2 > 0
        alpha = grad_2

        # Apply pixel coefficient in each weight
        return alpha.squeeze_(0).mul_(torch.relu(self.hook_g.squeeze(0))).flatten(1).sum(-1)


pytorch 完整实现'CAM', 'ScoreCAM', 'SSCAM', 'ISCAM' 'GradCAM', 'GradCAMpp', 'SmoothGradCAMpp', 'XGradCAM', 'LayerCAM' :






