[论文精读]Active and Semi-Supervised Graph Neural Networks for Graph Classification

论文网址：Active and Semi-Supervised Graph Neural Networks for Graph Classification | IEEE Journals & Magazine | IEEE Xplore英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Graph Neural Networks

2.3.2. Active Learning

2.3.3. Semi-Supervised Learning

2.4. Methodology

2.4.1. Problem Statement and Notations

2.4.2. Framework

2.4.3. Active Learning

2.4.4. Semi-Supervised Learning

2.4.5. Discussion

2.5. Experiments

2.5.1. Datasets

2.5.2. Baselines

2.5.3. Comparison With Different Methods

2.5.4. Parameter Sensitivity Analysis

2.5.5. Ablation Study

2.5.6. Verification of Effectiveness on Used Labeled Data

2.6. Conclusion

3. 知识补充

3.1. Active Learning

3.2. Difference between semi-supervise, weakly supervise, self-supervise and active learning

4. Reference

1. 心得

（1）hh，相关工作能先简介一下某某方式是咋用的然后再介绍

（2）近期看的二区论文里面写得top清晰的了只能说，就没什么没解释的notation和图blabla...

（3）还可以，读起来比较顺畅和轻松，不是高血压文章，上一篇都要高压麻了

2. 论文逐段精读

2.1. Abstract

①做半监督的开头都是说“现在大家都是用数据集的xx%来训练xx%来测试~样本量太小”是吧。勾史，现在做半监督的多的一

②They proposed a novel active and semi-supervised graph neural network (ASGNN)

③They add collaborative learning with multi GNNs to improve the generalization

2.2. Introduction

①大家都爱在intro里面举例一些工作啊。总感觉这样容易和相关工作重复。举例了一些自监督的图分类模型

②Other self-supervised, semi-supervised models did not use active learning

2.3. Related Work

2.3.1. Graph Neural Networks

①Briefly introducing MPNN and other models and claiming they do not consider the sample problems

2.3.2. Active Learning

①Listing active learning methods and mentioning that they did not apply it in GNN graph classification

2.3.3. Semi-Supervised Learning

①Mentions some semi-superviesed models

2.4. Methodology

2.4.1. Problem Statement and Notations

（1）Graph Classification

①Graph representation: $G_m=\left ( V,E \right )$ , where $V$ is node set and $E$ denotes edge set

②Input space/samples: $\left \{ G_m \right \}^M_{m=1}$

③Class label set: $Y$

④Task: mapping $f:\left \{ G_m \right \}^M_{m=1}\rightarrow Y$

（2）Supervised Graph Classification

①For training set $G_{training}=\left \{ G_1,...,G_l \right \}$

②Predicting another samples $G_{test}=\left \{ G_{l+1},...,G_{l+u} \right \}$

（3）Active Graph Classification

①There are training set $G_{training}=\left \{ G_1,...,G_l \right \}$ and testing set $G_{test}=\left \{ G_{l+1},...,G_{l+u} \right \}$

②Selecting $G_{select}=\left \{ G_{l+1},...,G_{l+k} \right \}$ from test set and adds them to the training set after annotation

③Updating training set and testing set, then use new training set to predict new test set

（4）Semi-Supervised Graph Classification

①Training set contains $l$ labeled graph and $u$ unlabeled graph: $G_{training}=\left \{ G_1,...,G_l,G_{l+1},...,G_{l+u} \right \}$

②Task: predict unlabeled graph: $G_{test}=\left \{ G_{l+1},...,G_{l+u} \right \}$

2.4.2. Framework

①Notations:

（哥们儿的类别数是 $L$ 表示的所以有时候有点奇怪（可能也是想表示Labeled data吧），很多人的类别是 $C$ 。而哥们儿的神经网络层数用 $T$ 表示，很多人是用的 $L$ ）

②“Active learning can select the graph examples with high value from the test set, and semi-supervised learning can select the graph examples with high confidence level from test set”

③Overall framework of their model:

④Algorithm of ASGNN framework:

⑤Extraction of graph feature matrix:

$h_{v_i}^{(t)}=NN_\Theta^{(t)}\left(h_{v_i}^{(t-1)}+\sum_{v_j\in\mathcal{N}(v_i)}h_{v_j}^{(t-1)}\right)$

（就是一个很普通的特征聚合惹，就GCN吧差不多，只是W放在NN那去了，这里的NN作者说可以是MLP， $\Theta$ 是里面的参数惹）

where $h^t_{v_i}$ is the feature vector of node $v_i$ at the $t$ -th layer

⑥Readout function:

$h_{G_m}=readout\left ( \left \{ h^{\left ( T \right )}_{v_i}\mid v_i \in G_m \right \} \right )$

where $T$ denotes the number of conv layers. They allow mean/sum pooling and MLP layers

⑦Loss for 2 GNNs:

$\mathcal{L}_{train}=\sum_{g_m}\left ( -\sum^L_{l=1}y_l\left [ G_m \right ] log\left ( p_l[G_m] \right )\right )$

where $L$ is the total class number, $y_l[G_m]$ is indicator variable (correct classification will let it be 1 otherwise 0), $p_l[G_m]$ denotes the predicted probability of $G_m$ belongs to class $l$

⑧⭐Samples adding: the intersection of two GNNs' selection

2.4.3. Active Learning

①Active learning module:

②Classification probability:

$p_l[G_m]=\text{Soft}\max\left(S_l[h_{G_m}]\right)=\frac{\exp(S_l[h_{G_m}])}{\sum_{l^{'}=1}^L\exp(S_{l^{'}}[h_{G_m}])}$

where $S_l[h_{G_m}]$ is the soft clustering score that $G_m$ is belong to class $l$ , so $\text{Soft}\max\left(S_l[h_{G_m}]\right)$ is the prediction probability in $D_U$

③Entropy of graph:

$E=-\sum_{l=1}^Lp_l[G_m]^*\log(p_l[G_m])$

④They further introduced euclidean distance to measure the richness of the information contained in the graph:

$C_l=\frac{\sum\left\{h_l\left[G_m\right]\mid G_m\in D_L\quad and\quad\mathcal{Y}\left[G_m\right]=l\right\}}{|\{G_m\mid G_m\in D_L\quad and\quad\mathcal{Y}\left[G_m\right]=l\}|}$

the distance between a graph and a cluster center:

$d_{G_m}=\min_{l=1,\ldots,L}\left\{\left\|h_{G_m}-C_l\right\|_2^2\right\}$

⑤作者在熵和欧几里得距离上分别给出了 $p_E$ 和 $p_d$ 作为百分比表示（很奇怪的说法），然后给出加权分：

$I_{G_\mathrm{m}}=\alpha^*p_E+(1-\alpha)^*p_d$

⑥Active learning algorithm:

2.4.4. Semi-Supervised Learning

①Select data with high confidence level by choosing data with high soft clustering score:

$S_l\left[G_m\right]=\frac{\left(\left\|h_{G_m}-C_l\right\|_2^2\right)^{-1}}{\sum_{l'=1}^L\left(\left\|h_{G_m}-C_{l'}\right\|_2^2\right)^{-1}}$

where $S_l\left[G_m\right]$ denotes $G_m$ is predicted to belong to class $l$

②Algorithm of semi-supervised module:

2.4.5. Discussion

讲了讲俩模型相互促进有多好

2.5. Experiments

2.5.1. Datasets

①12 datasets: MUTAG, PTC_MR, COLLAB, BZR_MD, BZR, NCI1, PROTEINS, ER_MD, COX2_MD, DHFR, DHFR_MD and PTC_FR

②Statistic of datasets:

2.5.2. Baselines

①介绍了对比的模型，我就懒得写了，仁者见仁智者见智吧爱比谁比谁

2.5.3. Comparison With Different Methods

①Comparison table with label rate 10%:

（感觉普通图分类这种数据集多的就更看重精度了，没其他一堆破指标）

2.5.4. Parameter Sensitivity Analysis

①Proportion of the graph selected from the test set by active learning with grid search $ALK\in\{1\%,2\%,5\%,8\%,10\%\}$ on MUTAG, PTC_MR, PROTEINS and DHFR:

（xd说随着主动学习选的代表标志增高。那咋说明到上限了呢咋不继续增高呢（作者在这一小节最后一段说模型还有继续提高的潜力，笑死，不能自己挖掘十八））

②Proportion of the graph selected from the test set by semi-supervised learning with grid search $SSK\in\{2\%,4\%,6\%,8\%,10\%\}$ on MUTAG, PTC_MR, PROTEINS and DHFR:

（？？？哈？？？咋这么相似呢那为啥不继续增加呢？）

③Ablation study on $\alpha$ with grid search $\alpha\in\{0.1,0.3,0.5,0.7,0.9 \}$ on MUTAG, PTC_MR, PROTEINS, and DHFR:

（也没提供什么特别的解释，就说都该关注）

2.5.5. Ablation Study

①Module ablation on MUTAG, PTC_MR, PROTEINS, and DHFR:

2.5.6. Verification of Effectiveness on Used Labeled Data

①Different number of labeled data:

2.6. Conclusion

“对于未来的工作，我们希望进一步扩展拟议的框架”说了和没说一样

3. 知识补充

3.1. Active Learning

（1）定义：主动学习的核心思想是，如果允许机器学习算法选择要学习的数据，那么它可以用更少的标记训练实例实现更高的准确性。这主要基于一个关键假设：即如果学习算法能够选择它学习的数据，那么它在较少的训练下就能表现得更好。主动学习通过不确定性抽样、基于委员会的查询等策略，选择最具信息量的实例进行标记，从而减少人工标注成本，提高学习效率。

（2）参考学习1：超详细的主动学习Active Learning介绍【理论+代码】-CSDN博客

（3）参考学习2：主动学习（Active Learning），看这一篇就够了 - 知乎 (zhihu.com)

3.2. Difference between semi-supervise, weakly supervise, self-supervise and active learning

（1）半监督学习（Semi-supervised Learning）

①定义：结合少量标记样本和大量未标记样本进行训练。

②例子：在图像分类任务中，假设有100个标记的图像（如猫和狗），和1000个未标记的图像。先用标记图像训练初步模型，然后使用该模型对未标记图像生成伪标签，再将这些伪标签加入训练集，进一步提高模型性能。

（2）弱监督学习（Weakly-supervised Learning）

①定义：使用不完全、不准确或不精确的标签来训练模型。

②例子：在图像分类中，假设只有部分图像有标签，或者标签是模糊的（如只标记了“动物”，但没有具体到“猫”或“狗”）。弱监督学习算法可以利用这些不完全或不精确的标签进行训练，改善模型的分类性能。

（3）自监督学习（Self-supervised Learning）

①定义：通过自我生成标签来训练模型，不依赖外部标签。无监督学习不生成标签，只关注分布。

②例子：在自然语言处理任务中，可以用“下一个词预测”作为任务，模型根据前面的单词来预测下一个单词。模型使用自己的输出作为标签，通过这种方式学习语言的结构和上下文。

（4）主动学习（Active Learning）

①定义：模型可以选择要标记的样本去寻求人工标记（我在想作者的文章里面似乎没有人工标记？难道是选出来给半监督标记吗？这么流水线？），以获得更多的信息并提高学习效率。

②例子：假设在一个情感分析任务中，模型对一批未标记的评论进行分类，评估哪些评论对学习最有帮助。模型选择那些最不确定的评论（如边界模糊的评论）请求人类标记，从而获得更有价值的训练数据，提升模型的表现。

4. Reference

Xie, Y. et al. (2022) 'Active and Semi-Supervised Graph Neural Networks for Graph Classification', IEEE Transactions on Big Data, 8(4): 920-932. doi: 10.1109/TBDATA.2021.3140205