Clean-label Backdoor Attack against Deep Hashing based Retrieval论文笔记

news2025/4/9 14:04:29

论文名称	Clean-label Backdoor Attack against Deep Hashing based Retrieval
作者	Kuofeng Gao （Tsinghua University）
出版社	arxiv 2021
pdf	在线pdf
代码	无

简介：本文提出了首个针对 hashing 模型的 clean-label backdoor attack。生成 targeted adversarial patch 作为 trigger，生成 confusing perturbations 去阻碍模型的学习，让其学到更多 trigger 信息。其中 confusing perturbations 是难以察觉的。作者通过实验验证了该方法的有效性，例如使用40张 poison data 就可以使攻击成功。

2.introduction

背景：

针对 hashing retrieval 的 clean-label backdoor attack

文章的贡献：

提出了首个针对 hashing 系统的 clean-label backdoor attack
提出了 confusing perturbations 的方法促使模型学到更多 trigger 信息
验证了实验的效果 including transfer-based attack, less number of poisoned images

deep hashing：

$\boldsymbol{h}=F(\boldsymbol{x})=\operatorname{sign}\left(f_{\boldsymbol{\theta}}(\boldsymbol{x})\right)$

由于符号函数的梯度的梯度问题，可以使用 $t a n h (.)$ 进行代替

$F^{\prime}(\boldsymbol{x})=\tanh \left(f_{\boldsymbol{\theta}}(\boldsymbol{x})\right)$

3.method

3.1 模型结构

在这里插入图片描述

算法的流程为：

根据 target 类生成 universal adversarial patch 作为 trigger
根据 target 类生成 confusing perturbations
对数据进行投毒，生成带有 trigger 和 confusing perturbations 的数据集
使用 posion 数据集训练 backdoor 模型
测试 backdoor 模型的 map 和 t-map

3.2 Trigger Generation

论文在图片的左下角生成trigger：

injection function():

$\hat{\boldsymbol{x}}=B(\boldsymbol{x}, \boldsymbol{p})=\boldsymbol{x} \odot(\mathbf{1}-\boldsymbol{m})+\boldsymbol{p} \odot \boldsymbol{m}$

$p$ trigger pattern
$m$ predefined mask
$\odot$ denotes the element-wise product
该公式参照了论文： (clean-label videos attack)

参考论文 DHTA 本文作者使用 DHTA 中生成 universal adversarial patch 的方法生成 trigger

$\min _{\boldsymbol{p}} \sum_{\left(\boldsymbol{x}_{i}, \boldsymbol{y}_{i}\right) \in \boldsymbol{D}} d_{H}\left(F^{\prime}\left(B\left(\boldsymbol{x}_{i}, \boldsymbol{p}\right)\right), \boldsymbol{h}_{a}\right)$

$D$ : training set
$h_a$ anchor code

$h_a$ anchor code 是一串能代表 target 类的哈希码（由 DHTA 提出）

$\boldsymbol{h}_{a}=\arg \min _{\boldsymbol{h} \in\{+1,-1\}^{K}} \sum_{\left(\boldsymbol{x}_{i}, \boldsymbol{y}_{i}\right) \in \boldsymbol{D}^{(t)}} d_{H}\left(\boldsymbol{h}, F\left(\boldsymbol{x}_{i}\right)\right)$
$p$ trigger

通过优化该目标函数，生成的 trigger ，模型会将其判断为 target 类

所有 poison data 都共用同一个 trigger

3.3 Perturbing Hashing Code Learning

为了迫使模型更加关注 trigger ，本文提出了在 posion data 中加入 confusing perturbations

受到文章：(Yang et al. 2018) 的启发，作者发现这些 perturbations 可增大 original query image 之间的距离

在这里插入图片描述

如图所示，加入 perturbations 后，原本紧密的 yurt 类变得十分分散。

本文的作者通过在 target 类中加入 perturbations，使得模型在学习带有 trigger 的 target 类信息时，更多地学习到 trigger 信息

**目标函数：**

$\begin{aligned} & L_{c}\left(\left\{\boldsymbol{\eta}_{i}\right\}_{i=1}^{M}\right) \\=& \frac{1}{M(M-1)} \sum_{i=1}^{M} \sum_{j=1, j \neq i}^{M} d_{H}\left(F^{\prime}\left(\boldsymbol{x}_{i}+\boldsymbol{\eta}_{i}\right), F^{\prime}\left(\boldsymbol{x}_{j}+\boldsymbol{\eta}_{j}\right)\right) \end{aligned}$

$M$ 张 target 类的图片
$\boldsymbol{\eta}_{i}$ 表示为 $x_i$ 的 perturbations
为了让 perturbations imperceptible，引入了 $\ell_{\infty}$ 进行限制

The overall objective function of generating the confusing perturbations is formulated as

$\max _{\left\{\boldsymbol{\eta}_{i}\right\}_{i=1}^{M}} \lambda \cdot L_{c}\left(\left\{\boldsymbol{\eta}_{i}\right\}_{i}^{M}\right)+(1-\lambda) \cdot \frac{1}{M} \sum_{i=1}^{M} L_{a}\left(\boldsymbol{\eta}_{i}\right)$
s.t. $\left\|\boldsymbol{\eta}_{i}\right\|_{\infty} \leq \epsilon, i=1,2, \ldots, M$

$L_{a}\left(\boldsymbol{\eta}_{i}\right)=d_{H}\left(F^{\prime}\left(\boldsymbol{x}_{i}+\boldsymbol{\eta}_{i}\right), F\left(\boldsymbol{x}_{i}\right)\right)$

4.experiments

4.1 实验细节

数据集的划分：

Datasets and Target Models.

数据集	train	query	database	class
ImageNet	100*100（from database）	50*100	1300*100	100
MS-COCO	10000	5000	117218	80
Places365	9000	3600	36000	36

ImageNet：数据集的划分方法按照 (hashnet-pdf)，数据集可由HashNet-github上下载
MS-COCO：数据集的划分方式按照（hashnet-pdf），数据集可由HashNet-github上下载
Places365：数据集的划分方式按照 (hashstash-pdf)，数据集可由 HashStash-github下载

target models

hashing target models 的设置：使用预训练好的分类网络，进行 fine-tune 。再将分类网络最后一层换为 hashing layer 然后微调。
- 具体网络结构有：
  - VGG-11，VGG-13
  - ResNet-34，ResNet-50，WideResNet-50-2：
  训练过程的具体描述在文章 Appendix B 中
除了最基本的 hashing 模型，文中还对 HashNet 和 DCH 进行了测试

Attack Settings

选择了5类作为target label

poisoned images is 60 on all datasets

During the process of the trigger generation, we optimize the trigger pattern with the batch size 32 and the step size 12. The number of iterations is set as 2,000.
Generating the confusing perturbations. The perturbation magnitude $\varepsilon$ is set to 0.032. The number of epoch is 20 and the step size is 0.003. The batch size is set to 20 for generating the confusing perturbations.

对比实验的设置：

Tri：加入了trigger

Tri+Noise：加入了trigger和均匀分布的噪声

Tri+Adv：加入了 trigger 和 adversarial perturbations