文章目录
- 前言
- 一、热力图介绍
-
- 1、热力图应用说明
- 2、热力图代码整体思路
- 3、实验效果
- 二、heatmap类解读
- 三、GradCAM、GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM等类源码解读
-
- 1、GradCAM类源码
- 2、BaseCAM类源码解读
-
- 1、BaseCAM源码
- 2、forward函数源码解读
-
- outputs = self.activations_and_grads(input_tensor)方法
- targets方法解读
- 梯度求解源码解读(uses_gradients)
- 梯度cam计算方法
- 四、ActivationsAndGradients源码解读
- 五、yolo与vit的gradcam求解代码
-
- 1、computer_heatmap代码说明
- 2、数据处理与模型加载
- 3、热力图获取
- 4、热力图与原图绘制
- 5、其它
- 6、ActivationsAndGradients方法修改
- 六、完整源码
-
- yolov5热力图可能报错问题
前言
在计算机视觉领域,理解深度学习模型如YOLOv5和Vision Transformers (ViT)如何进行目标检测至关重要。热力图作为一种强大的可视化工具,通过颜色编码的方式直观展示了模型对图像各部分的关注度,帮助我们洞察模型的决策过程。正好,我有一个transformer与cnn结合的网络,我就介绍基于CNN与transformer结构网络的热力图。本文章将介绍如何构建热力图以及实现细节等内容。
一、热力图介绍
1、热力图应用说明
深度学习热力图是一种可视化工具,它通过高亮显示输入数据中对模型预测结果贡献较大的区域,帮助我们理解复杂的深度学习模型是如何做出决策的。例如,在图像分类任务中,热力图可以指出图片的哪些部分促使模型选择了特定的类别标签。生成热力图常用的方法包括梯度加权类激活映射(Grad-CAM),该方法计算特定类别相对于卷积神经网络中特征图的梯度,以确定图像中重要区域的位置。此外,还有特征图可视化、反向传播方法、遮挡实验以及像LIME和SHAP这样的解释性框架。这些技术不仅增强了模型的透明度和可解释性,还为调试模型和改进算法提供了宝贵的见解。通过利用热力图,研究人员和工程师能够更好地理解和优化他们的深度学习系统。
2、热力图代码整体思路
我是将vit结构融合到了yolov5模型中,正好我需要写一个热力图实验。基于此,我将在本次热力图解读中穿插了yolov5条件热力图以及融合了cnn与transformer模型条件的热力图内容。
3、实验效果
先看实现效果。
二、heatmap类解读
我使用一个类来实现热力图构建,主要包含模型加载方法self.model,目标self.target = yolov5_target(opt.backward_mode, opt.conf_thr),层提取self.target_layers = [self.model.model[l] for l in opt.layer]与热力图初始化self.cam_model = GradHeat(self.model, self.target_layers) 。以下为初始化代码,如下:
class yolov5_heatmap:
def __init__(self, opt):
# weight, device, method, layer, backward_type, conf_threshold, ratio, show_box, renormalize
# self.conf_threshold, self.ratio, self.show_box, self.renormalize = conf_threshold, ratio, show_box, renormalize
self.opt=opt
self.device = torch.device(opt.device)
if opt.model_mode=='yolo':
self.model = init_model(opt.weights,self.device) # 载入模型
else :
self.model = init_model_lvf(opt.weights,self.device) # 载入模型
self.model_names = self.model.names # 类别名称
for p in self.model.parameters():
p.requires_grad_(True)
self.model.eval()
self.target = yolov5_target(opt.backward_mode, opt.conf_thr) # 对获得值进行加工,使其成为标量
self.target_layers = [self.model.model[l] for l in opt.layer]
self.cam_model = GradHeat(self.model, self.target_layers) # 这个就是热力图初始化方法
self.cam_model.activations_and_grads = ActivationsAndGradients(self.model, self.target_layers, None,opt.model_mode) # 这个是重构热力图类中的激活方法
self.colors = np.random.uniform(0, 255, size=(len(self.model_names), 3)).astype(np.int) # 获得每个不同类的颜色
我大致说下每个内容作用是什么。目标self.target是获得模型输出值,层提取self.target_layers需要查看热力图的层、这个需要activatins激活方法获得激活值、而热力图self.cam_model就是帮忙实现的了,这个热力图模型我是继承了gradcam的热力图,讲解一个来说明热力图方法。
三、GradCAM、GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM等类源码解读
我以GradCAM方法来做热力图求解源码解读。实际上,热力图就是需要你想查看层的激活值和梯度,而激活值是随着模型前向传播获得,而梯度是后向传播获得。
1、GradCAM类源码
我们看到GradCAM源码是继承了BaseCAM类,而上面类基本也是基于这个基类。我们不用关心这些东西,我们继续查看BaseCAM类。
import numpy as np
from pytorch_grad_cam.base_cam import BaseCAM
class GradCAM(BaseCAM):
def __init__(self, model, target_layers, use_cuda=False,
reshape_transform=None):
super(
GradCAM,
self).__init__(
model,
target_layers,
use_cuda,
reshape_transform)
def get_cam_weights(self,
input_tensor,
target_layer,
target_category,
activations,
grads):
return np.mean(grads, axis=(2, 3))
2、BaseCAM类源码解读
1、BaseCAM源码
我直接给出BaseCAM源码,而我们也不需要关心,我们需要对forward内容感兴趣。我后面解读这个内容。同时,我们不得不关注初始化的这个方法self.activations_and_grads = ActivationsAndGradients( self.model, target_layers, reshape_transform)
,这个也很重要,我后面也会解读这个。
import numpy as np
import torch
import ttach as tta
from typing import Callable, List, Tuple
from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients
from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
from pytorch_grad_cam.utils.image import scale_cam_image
from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget
class BaseCAM:
def __init__(self,
model: torch.nn.Module,
target_layers: List[torch.nn.Module],
use_cuda: bool = False,
reshape_transform: Callable = None,
compute_input_gradient: bool = False,
uses_gradients: bool = True) -> None:
self.model = model.eval()
self.target_layers = target_layers
self.cuda = use_cuda
if self.cuda:
self.model = model.cuda()
self.reshape_transform = reshape_transform
self.compute_input_gradient = compute_input_gradient
self.uses_gradients = uses_gradients
self.activations_and_grads = ActivationsAndGradients(
self.model, target_layers, reshape_transform)
""" Get a vector of weights for every channel in the target layer.
Methods that return weights channels,
will typically need to only implement this function. """
def get_cam_weights(self,
input_tensor: torch.Tensor,
target_layers: List[torch.nn.Module],
targets: List[torch.nn.Module],
activations: torch.Tensor,
grads: torch.Tensor) -> np.ndarray:
raise Exception("Not Implemented")
def get_cam_image(self,
input_tensor: torch.Tensor,
target_layer: torch.nn.Module,
targets: List[torch.nn.Module],
activations: torch.Tensor,
grads: torch.Tensor,
eigen_smooth: bool = False) -> np.ndarray:
weights = self.get_cam_weights(input_tensor,
target_layer,
targets,
activations,
grads)
weighted_activations = weights[:, :, None, None] * activations
if eigen_smooth:
cam = get_2d_projection(weighted_activations)
else:
cam = weighted_activations.sum(axis=1)
return cam
def forward(self,
input_tensor: torch.Tensor,
targets: List[torch.nn.Module],
eigen_smooth: bool = False) -> np.ndarray:
if self.cuda:
input_tensor = input_tensor.cuda()
if self.compute_input_gradient:
input_tensor = torch.autograd.Variable(input_tensor,
requires_grad=True)
outputs = self.activations_and_grads(input_tensor)
if targets is None:
target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1)
targets = [ClassifierOutputTarget(
category) for category in target_categories]
if self.uses_gradients:
self.model.zero_grad()
loss = sum([target(output)
for target, output in zip(targets, outputs)])
loss.backward(retain_graph=True)
# In most of the saliency attribution papers, the saliency is
# computed with a single target layer.
# Commonly it is the last convolutional layer.
# Here we support passing a list with multiple target layers.
# It will compute the saliency image for every image,
# and then aggregate them (with a default mean aggregation).
# This gives you more flexibility in case you just want to
# use all conv layers for example, all Batchnorm layers,
# or something else.
cam_per_layer = self.compute_cam_per_layer(input_tensor,
targets,
eigen_smooth)
return self.aggregate_multi_layers(cam_per_layer)
def get_target_width_height(self,
input_tensor: torch.Tensor) -> Tuple[int, int]:
width, height = input_tensor.size(-1), input_tensor.size(-2)
return width, height
def compute_cam_per_layer(
self,
input_tensor: torch.Tensor,
targets: List[torch.nn.Module],
eigen_smooth: bool) -> np.ndarray:
activations_list = [a.cpu().data.numpy()
for a in self.activations_and_grads.activations]
grads_list = [g.cpu().data.numpy()
for g in self.activations_and_grads.gradients]
target_size = self.get_target_width_height(input_tensor)
cam_per_target_layer = []
# Loop over the saliency image from every layer
for i in range(len(self.target_layers)):
target_layer = self.target_layers[i]
layer_activations = None
layer_grads = None
if i < len(activations_list):
layer_activations = activations_list[i]
if i < len(grads_list):
layer_grads = grads_list[i]
cam = self.get_cam_image(input_tensor,
target_layer,
targets,
layer_activations,
layer_grads,
eigen_smooth)
cam = np.maximum(cam, 0)
scaled = scale_cam_image(cam, target_size)
cam_per_target_layer.append(scaled[:, None, :])
return cam_per_target_layer
def aggregate_multi_layers(
self,
cam_per_target_layer: np.ndarray) -> np.ndarray:
cam_per_target_layer = np.concatenate(cam_per_target_layer, axis=1)
cam_per_target_layer = np.maximum(cam_per_target_layer, 0)
result = np.mean(cam_per_target_layer, axis=1)
return scale_cam_image(result)
def forward_augmentation_smoothing(self,
input_tensor: torch.Tensor,
targets: List[torch.nn.Module],
eigen_smooth: bool = False) -> np.ndarray:
transforms = tta.Compose(
[
tta.HorizontalFlip(),
tta.Multiply(factors=[0.9, 1, 1.1]),
]
)
cams = []
for transform in transforms:
augmented_tensor = transform.augment_image(input_tensor)
cam = self.forward(augmented_tensor,
targets,
eigen_smooth)
# The ttach library expects a tensor of size BxCxHxW
cam = cam[:, None, :, :]
cam = torch.from_numpy(cam)
cam = transform.deaugment_mask(cam)
# Back to numpy float32, HxW
cam = cam.numpy()
cam = cam[:, 0, :, :]
cams.append(cam)
cam = np.mean(np.float32(cams), axis=0)
return cam
def __call__(self,
input_tensor: torch.Tensor,
targets: List[torch.nn.Module] = None,
aug_smooth: bool = False,
eigen_smooth: