[论文精读]Polarized message-passing in graph neural networks

论文网址：Polarized message-passing in graph neural networks - ScienceDirect

论文代码：he-tiantian/PMP-GNNs：极化消息传递图神经网络的 Pytorch 实现，发表在 Artificial Intelligence，2024 年。 (github.com)

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 省流版

1.1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. The polarized message-passing paradigm

2.3.1. Notations

2.3.2. Formulating polarized message-passing at a single layer

2.3.3. Expressiveness of polarized message-passing

2.4. Polarized message-passing graph neural networks

2.4.1. Polarized message-passing graph convolutional network

2.4.2. Polarized message-passing graph attention network

2.4.3. Polarized message-passing graph PageRank network

2.4.4. Loss function

2.4.5. Computational complexity of PMP-GNNs

2.4.6. Expressiveness of PMP-GNNs

2.5. Theoretical analysis

2.5.1. Proof of Theorem 1

2.5.2. Prood of Theorem 2

2.6. Related work

2.6.1. Graph neural networks

2.6.2. Message-passing graph neural networks

2.7. Experimental setup

2.7.1. Comparison baselines

2.7.2. Datasets and real-world analytical tasks

2.7.3. Experimental settings

2.8. Results and analysis

2.8.1. Classification performance on scientific articles

2.8.2. Clustering performance on blogs and text-based images

2.8.3. Analytical performance in coauthor graphs

2.8.4. Performance in rich-text classification

2.8.5. Comparisons of similarity-, dissimilarity-, and PMP-based GNNs

2.8.6. Comparisons of learned message weights

2.9. Conclusion

3. Reference

1. 省流版

1.1. 心得

（1）好多数学证明，看起来挺严谨的样子。为什么这样的东西只值二区，我感觉挺好的。可能是创新不是特别大？但是实验属于是多麻了

2. 论文逐段精读

2.1. Abstract

①They proposed Polarized message-passing (PMP) to capture the similarity relationship between nodes

②Thus, three new GNNs were derived, PMP graph convolutional network (PMP-GCN), PMP graph attention network (PMP-GAT), and PMP graph PageRank network (PMP-GPN)

③They verified them on 12 real datasets based 5 tasks

2.2. Introduction

①Current message-passing graph neural networks (MPGNNs) consider the SIMILARITY more, thus the authors want to contain the DISSIMILARITY

②⭐They decomposed traditional adjacency matrix $A$ to 2 matrices, the SIMILARITY matrix and the DISSIMILARITY matrix

③⭐They combine the two sim matrices to new sparse matrix, which is the normalized product of the two sim matrices（咦~感觉这样不能真正实现复合？反而揉在一起了，怎么体现极化呢）

negate vt.否定；否认；使无效；取消

2.3. The polarized message-passing paradigm

2.3.1. Notations

①Graph is represented as $G=\{V,E,\mathbf{X}\}$ , where each graph contains $N$ nodes and $\left | E \right |$ edges

② $\mathbf{X}\in\mathbb{R}^{N\times D}$ denotes the input node feature matrix, $\mathbf{H}^{l}$ is the node feature matrix of the subsequent convolved, $\mathbf{H}^{1}=\mathbf{X}$

③each node belongs to a class of $C$

④ $\mathcal{N}_{i}$ denotes one-hop neighbor

⑤ $\mathbf{A}\in\{0,1\}^{N\times N}$ denotes the adjacency matrix

⑥ $\mathbf{W}^{l}$ denotes the learnable weight matrix

⑦ $\mathbf{p}^{l}$ denotes the latent positions at the $l$ -th layer

2.3.2. Formulating polarized message-passing at a single layer

①Node correlations matrix $\mathbf{C}$ :

$\begin{aligned}&\mathbf{H}=\mathbf{H}^{l}\mathbf{W}_{h}^{l},\mathbf{P}=\mathbf{P}^{l}\mathbf{W}_{p}^{l},\\&\mathbf{C}_{ij}=\mathbf{A}_{ij}\cdot[\mathbf{a}(\mathbf{H}_{i,:}||\mathbf{H}_{j,:})^{T}+\mathbf{P}_{i,:}diag(\mathbf{b})\mathbf{P}_{j,:}^{T}]\end{aligned}$

where the subscript $_{i, :}$ denotes the $i$ -th row vector, $||$ is the concatenation symbol, $\mathbf{a}$ and $\mathbf{b}$ are the learnable weight vectors. (C= A_{ij}*{[a,1]*[1,2f] + [N,p]*[p,p]*[p,N]}，感觉上这里A是一个二值化数，难道这里会强行让f=1/2N吗)

②Node uncorrelated matrix $\mathbf{S}$ :

$\mathbf{S}_{ij}=\mathbf{A}_{ij}\cdot[\alpha|\mathbf{H}\mathbf{W}_{i,:}-(\mathbf{H}\mathbf{W})_{j,:}|^2+(1-\alpha)|\mathbf{P}_{i,:}-\mathbf{P}_{j,:}|^2]$

where $\alpha\in(0,1)$ is the learnable parameter

③Construction of integrated polarization information:

$\begin{aligned} &\mathbf{E}_{ij}=\mathbf{A}_{ij}\cdot\exp(\mathbf{C}_{ij}-\beta\mathbf{S}_{ij}), \\ &\mathbf{M}_{i}=\sum_{j\in\mathcal{N}_{i}}\phi(\mathbf{E}_{ij})\cdot\mathbf{H}_{j,:}, \\ &\mathrm{FEATURE~UPDATE:} \mathbf{H}_{i,:}^{l+1}=\frac{\theta}{|\mathcal{N}_{i}|}\mathbf{H}_{i,:}+\mathbf{M}_{i}, \end{aligned}$

where $\phi \left ( \cdot \right )$ is an appropriately function leveraging $\mathbf{E}$ to generate the weights for computing messages from neighbors, $\beta$ denotes learnable positive parameter, $\theta$ denotes learnable positive irrational

④Polarized message-passing (PMP):

2.3.3. Expressiveness of polarized message-passing

①大概证明了一下为什么他们提出来的极化消息传递更好，如有需要可以参考原文

2.4. Polarized message-passing graph neural networks

2.4.1. Polarized message-passing graph convolutional network

①PMP-GCN:

$\begin{aligned} &\mathbf{H}=\mathbf{H}^l\mathbf{W}_h^l,\mathbf{P}=\mathbf{P}^l\mathbf{W}_p^l,\mathbf{E}_{ij}=\mathbf{A}_{ij}\cdot\exp(\mathbf{C}_{ij}-\beta\mathbf{S}_{ij}), \\ &\mathbf{M}_{i}={\frac{1}{\sqrt{\sum_{k\in\mathcal{N}_{i}}\mathbf{E}_{ik}}}}\sum_{j\in\mathcal{N}_{i}}{\frac{\mathbf{E}_{ij}}{\sqrt{\sum_{k\in\mathcal{N}_{j}}\mathbf{E}_{jk}}}}\cdot\mathbf{H}_{j,:}, \\ &\mathbf{H}_{i,}^{l+1}={\frac{\theta}{|{\mathcal{N}}_{i}|}}\mathbf{H}_{i,:}+\mathbf{M}_{i}. \end{aligned}$

2.4.2. Polarized message-passing graph attention network

①PMP-GAT:

$\begin{aligned} &\mathbf{H}=\mathbf{H}^{l}\mathbf{W}_{h}^{l},\mathbf{P}=\mathbf{P}^{l}\mathbf{W}_{p}^{l},\mathbf{E}_{ij}=\mathbf{A}_{ij}\cdot\exp(\mathbf{C}_{ij}-\beta\mathbf{S}_{ij}), \\ &\mathbf{M}_{i}=\sum_{j\in\mathcal{N}_{i}}\frac{\mathbf{E}_{ij}}{\sum_{k\in\mathcal{N}_{i}}\mathbf{E}_{ik}}\cdot\mathbf{H}_{j,:}, \\ &\mathbf{H}_{i,}^{l+1}=\frac{\theta}{|\mathcal{N}_{i}|}\mathbf{H}_{i,:}+\mathbf{M}_{i}. \end{aligned}$

2.4.3. Polarized message-passing graph PageRank network

①PMP-GPN:

$\begin{aligned}&\mathbf{H}^{in}=MLP(\mathbf{X}),\mathbf{P}^{in}=MLP(\mathbf{P}^{0}),\mathbf{E}_{ij}=\mathbf{A}_{ij}\cdot\exp(\mathbf{C}_{ij}-\beta\mathbf{S}_{ij}),\\&\mathbf{M}_{i}=\frac{1}{\sqrt{\sum_{k\in\mathcal{N}_{i}}\mathbf{E}_{ik}}}\sum_{j\in\mathcal{N}_{i}}\frac{\mathbf{E}_{ij}}{\sqrt{\sum_{k\in\mathcal{N}_{j}}\mathbf{E}_{jk}}}\cdot\mathbf{H}_{j,,},\\&\mathbf{H}_{i,}^{l+1}=\frac{\theta}{|\mathcal{N}_{i}|}\mathbf{H}_{i,:}+(1-\alpha)\mathbf{M}_{i}+\alpha\mathbf{H}_{i,:}^{in},\end{aligned}$

2.4.4. Loss function

①Loss function contains task-specific loss and latent-position loss:

$L=L_{task}+tr(\mathbf{P}^{outT}\mathbf{LP}^{out})$

where $\mathbf{L}$ denotes the Laplacian matrix of $\mathbf{A}$

2.4.5. Computational complexity of PMP-GNNs

①Mapping computational complexity: $O(2Ndd^{\prime})$

②Computing $\mathbf{C}$ and $\mathbf{S}$ : $O(4|E|d^{\prime})$

③Representation learning: $O(|E|(1+d^{\prime}))$

④Layer complexity of PMP-GCN and PMP-GAT: $O(|E|(1+5d')+2Ndd')$

⑤Layer complexity of PMP-GPN: $O(|E|d)$

2.4.6. Expressiveness of PMP-GNNs

①作者证明他们的PMP可以通过Weisfeiler-Lehman 检验，并且说所有MPGNN表达性的上限是WL

——————————————以下证明暂略——————————————

2.5. Theoretical analysis

2.5.1. Proof of Theorem 1

2.5.2. Prood of Theorem 2

（1）Schematic methodology to complete the proof of Theorem 2

（2）The equivalence between the PMP-GCN layer and the 1-WL test

（3）The equivalence between the PMP-GAT layer and the 1-WL test

（4）The equivalence between the PMP-GPN layer and the 1-WL test

2.6. Related work

2.6.1. Graph neural networks

①GNN can be used to scientific document classification, coauthor analysis, conversation generation, social community detection, and text classification

2.6.2. Message-passing graph neural networks

①List some strategies

2.7. Experimental setup

2.7.1. Comparison baselines

①Compared model: a) GNN: GCN, MoNer, GraphSAGE, ARMA; b) Attention based GNN: GAT, GATv2, HardGAT, MAGNA; c) Diffused (PageRank based) GNN: JKNet, APPNP, SGC, GPR