Introduction
Poincaré Embeddings
- The Limitations of Euclidean Space for Hierarchical Data
- Embedding Hierarchies in Hyperbolic Space
Evaluation
References

Introduction

如今，表征学习变得越来越重要 (e.g. word embedding, embeddings of graphs, embeddings of multi-relational data)，许多复杂数据集也都具有一定的层次结构 (latent hierarchical structures)，但欧氏空间中优化得到的 Embeddings 建模复杂关系的能力受限于 embed 维数
为了增强 embed 对表征对象间复杂关系的表征能力，作者提出将具有层次关系的表征对象嵌入到一种双曲空间 – $n$ 维 Poincaré ball 中。此外，作者基于 Riemannian optimization 对 Poincaré ball 中的 embed 进行优化。实验证明 Poincaré embeddings 在编码具有层次特征的数据时，在表征能力和泛化能力上都超过了 Euclidean embeddings，特别是在低维数条件下

Poincaré Embeddings

The Limitations of Euclidean Space for Hierarchical Data

对于 hierarchical data，假如树的 branching factor 固定，则随着层数的加深，树的结点数将以指数级速度增长。如果要生成树的每个结点对应的 embed，则需要使得结点间的距离符合树结构，子结点和父结点间的距离应该比较接近，不同分支的叶结点间应彼此远离
下图展示了将 branching factor 为 4 的树结构嵌入到二维欧氏空间，可以看到欧氏空间的位置已经不太够用了，如果树结构层数更多，不同分支的各个叶结点离得将更近，这些叶结点在树结构上离得很远，但嵌入在欧氏空间上时距离确会很近。欧氏空间适合嵌入网格结构的数据，如果想要更好地表征更深层次的树结构，就必须使用更高维度的欧氏空间，这可能会导致更大的时间、空间开销甚至模型过拟合

Embedding Hierarchies in Hyperbolic Space

Hyperbolic Space 可以很好地建模 hierarchical data，有研究表明 “any finite tree can be embedded into a finite hyperbolic space such that distances are preserved approximately”.
下图展示了将树结构嵌入到 Poincaré disk. 由于靠近单位圆的边界时，距离以指数级速度增长，因此下图中每个相邻节点间的距离实际上都是相等的，并且虽然叶结点看上去比较拥挤，但实际上相隔的距离非常远

Poincaré Embeddings

作者假设要表征的数据具有隐性的层次树结构 (并没有直接获取到层次树结构)，然后通过无监督学习的方式将层次数据嵌入到 Poincaré ball 中来让数据的 embed 间的距离反映它们之间的语义相似度。引入层次树的先验结构信息有助于降低模型的时间和空间开销，并且提高模型的泛化能力

Why Poincaré ball model instead of a simple Poincaré disk model?
(1) First, in many datasets such as text corpora, multiple latent hierarchies can co-exist, which can not always be modeled in two dimensions.
(2) Second, a larger embedding dimension can decrease the difficulty for an optimization method to find a good embedding (also for single hierarchies) as it allows for more degrees of freedom during the optimization process.

为了便于进行梯度优化，作者使用 Poincaré ball model (Its distance function is differentiable and it has a relatively simple constraint on the representations.). Poincaré ball model 对应黎曼流形 $(\mathcal B^d,g_x)$ ，其中 $\mathcal B^d=\{x\in\R^d|\|x\|<1\}$ ， $g_x$ 为 Riemannian metric tensor
其中 $x\in\mathcal B^d$ ， $g^E=I_n$ 为 Euclidean metric tensor. $\in\mathcal B^d$ 间的距离为

Optimization

对于具有层次结构的数据 $\mathcal S=\{x_i\}_{i=1}^n$ ，我们想要找到它们对应的 embed $\Theta=\{\theta_i\}_{i=1}^n$ ，其中 $\theta_i\in\mathcal B^d$ ，使得 embed 间的 Poincaré distance 能反映它们之间的语义相似程度。为了得到 embed，需要求解如下优化问题：
(损失函数定义见 “Evaluation/Embedding Taxonomies” 一节)
由于 Poincaré Ball 为黎曼流形，因此我们可以通过 stochastic Riemannian optimization methods (RSGD, RSVRG, …) 求解。令 $\mathcal T_\theta\mathcal B$ 为点 $\theta\in\mathcal B^d$ 处的 tangent space， $\nabla_R\in\mathcal T_\theta\mathcal B$ 为 $\mathcal L(\theta)$ 的 Riemannian gradient， $\nabla_E$ 为 $\mathcal L(\theta)$ 的 Euclidean gradient，RSGD 的参数更新方式如下：
$\theta_{t+1}=\theta_t-\eta_t\nabla_R\mathcal L(\theta_t)$ 由于 Poincaré ball 为双曲空间的一种 conformal model，因此相邻向量在 Poincaré ball 中的角度和在欧氏空间里的角度相同 (具有保角性)，但向量长度在两个空间内不一样，因此为了从 Euclidean gradient 推出 Riemannian gradient，需要将 $\nabla_E$ 乘上 Poincaré ball metric tensor 的逆 $g_\theta^{-1}$ 来进行缩放
$\nabla_R=\frac{(1-\|\theta_t\|^2)^2}{4}\nabla_E$ 此外，还需要限制优化时的 embed 位于单位圆内
其中 $\varepsilon=10^{-5}$ . 最终的参数更新公式为

关于保角性 (conformal)：
(1) A metric $\tilde g$ is said to be conformal to another metric $g$ if it defines the same angles, i.e.
for all $x\in M$ ， $\ { 0 } u,v\in T_xM \backslash \{0\}$ .
(2) Poincaré ball 中的 metric tensor $g_x^{\mathbb D}$ 为
其中 $g^E=I_n$ 为 Euclidean metric tensor，它们满足
因此 Poincaré ball model 具有保角性
(3) 这也等价于存在 smooth function $\R$ ，i.e.，conformal factor，使得对所有 $x\in M$ ，都有 $\tilde g_x=\lambda_x^2 g_x$

Training Details

作者还使用了一些 tricks 来提升模型性能：(1) 用均匀分布 $\mathcal U(-0.001,0.001)$ 来随机初始化 embed，这可以让所有 embed 在初始化时靠近 $\mathcal B^d$ 的原点。(2) 为了得到一个较好的 initial angular layout，作者设置了 initial “burn-in” phase，在 10 个 epochs 内使用 $\eta/10$ 的学习率进行训练。结合均匀分布的位置初始化策略，这可以提升 angular layout 的质量，同时又不会让 embed 过于靠近边界

Evaluation

Embedding Taxonomies

作者在 transitive closure of the WORDNET noun hierarchy 上进行了实验，用于测试 Poincaré embeddings 对具有 clear latent hierarchical structure 的数据的嵌入能力。该数据集 $\mathcal D=\{(u,v)\}$ 包含 82,115 nouns 之间的 743,241 hypernymy relations，损失函数采用
$\begin{align*} \mathcal{L}(\Theta) &= -\sum_{\substack{(u,v) \in \mathcal{D}}} \log \frac{e^{-d(u,v)}}{e^{-d(u,v)} + \sum_{v'\in \mathcal{N}(u)} e^{-d(u, v')}} \\ \tag{14} \end{align*}$ 其中 $\mathcal N(u)=\{v'|(u,v')\notin\mathcal D\}\cup\{u\}$ 为 $u$ 的负样本集合，训练时给每个正样本随机采样 10 个负样本，整个优化过程十分类似于 Word2vec’s Skip-Gram loss with negative sampling
Reconstruction. 为了直接检验 embed 的表征质量，作者直接从 embed 重建数据，得到重建数据属于所有名词的概率，利用概率进行排序，其中 ground-truth 的 Rank 可以作为 metric. 作者将所有样本 ground-truth Rank 的均值以及它们的 mAP 作为测试指标
Link Prediction. 为了检验 embed 的泛化能力，作者将数据集划分为训练、验证和测试集来进行 link prediction，可以得到正样本对间的距离 $d (u, v)$ 在所有负样本对距离 $\{d(u,v')|u,v'\notin\mathcal D\}$ 中的 Rank. 作者将所有正样本对 Rank 的均值以及它们的 mAP 作为测试指标

在这里插入图片描述

Euclidean: $d(u, v) = \|u − v\|^2$
Translational: $d(u, v) = \|u − v + r\|^2$ . For this score function, we also learn the global translation vector $r$ during training.

下图为 mammals 子树对应的 Two-dimensional Poincaré embeddings 的可视化，蓝边为 Ground-truth “is-a” relations. A Poincaré embedding with $d = 5$ achieves mean rank 1.26 and MAP 0.927 on this subtree.

Network Embeddings

作者在 4 个 social networks 数据集上进行了 link prediction 实验，存在边的概率值采用下式计算：
其中 $r, t > 0$ 为超参

在这里插入图片描述

Lexical Entailment

在这里插入图片描述

References

paper: Nickel, Maximillian, and Douwe Kiela. “Poincaré embeddings for learning hierarchical representations.” Advances in neural information processing systems 30 (2017).
code: https://github.com/facebookresearch/poincare-embeddings
Implementation by Gensim: https://radimrehurek.com/gensim/models/poincare.html and a jupyter notebook tutorial: https://nbviewer.org/github/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Poincare%20Tutorial.ipynb
Implemented by “Hyperbolic Entailment Cones for Learning Hierarchical Embeddings”: https://github.com/dalab/hyperbolic_cones
Hyperbolic Geometry and Poincaré Embeddings
Implementing Poincaré Embeddings