【LDF】线性判别函数（一）

news2025/2/21 20:47:47

基于判别函数的判别准则

对于 $c$ 类分类问题：设 $g_i(\mathbf{x}), i=1,2, \ldots, c$ , 表示每个类别对应的判别函数，决策规则为：如果 $g_i(\mathbf{x})>g_j(\mathbf{x}), \forall j \neq i$ , 则 $\mathbf{x}$ 被分为第 $\omega_i$ 类，也就是说，样本被分到判别函数值最大的那一类
对于两类分类问题，可以只用一个判别函数，定义为： $g(\mathbf{x})=g_1(\mathbf{x})-g_2(\mathbf{x})$ ，判别准则为： $g(\mathbf{x})>0$ , 分为第一类; 否则为第二类

【例子】假如每一类的分类器是后验概率 $g_i(\mathbf{x})=p\left(\omega_i \mid \mathbf{x}\right)$ ，那么两类分类器为
$g(\mathbf{x})=p\left(\omega_1 \mid \mathbf{x}\right)-p\left(\omega_2 \mid \mathbf{x}\right)或g(\mathbf{x})=\log \frac{p\left(\mathbf{x} \mid \omega_1\right)}{p\left(\mathbf{x} \mid \omega_2\right)}+\log \frac{p\left(\omega_1\right)}{p\left(\omega_2\right)}$
在这里插入图片描述

线性判别函数与决策面

线性判别函数的基本形式：
$g(\mathbf{x})=\mathbf{w}^T \mathbf{x}+\mathcal{w}_0$
其中， $\mathbf{w}$ 是权重向量， $w_0$ 是偏移量
两类情形的决策规则
$\begin{cases}\mathbf{x} \in \omega_1, & \text { if } g(\mathbf{x})>0 \\ \mathbf{x} \in \omega_2, & \text { if } g(\mathbf{x})<0 \\ \text {uncertain, } & \text { if } g(\mathbf{x})=0\end{cases}$
$g(\mathbf{x})=0$ 定义了一个决策面, 它是类 $\omega_1$ 和 $\omega_2$ 的分界面
$g(\mathbf{x})=0$ 是一个超平面, 记为 $H$ 。位于该平面的任意向量与 $w$ 垂直:

【证明】如果 $\mathbf{x}_1$ 和 $\mathbf{x}_2$ 位于该超平面内, 于是有:
$g\left(\mathbf{x}_1\right)-g\left(\mathbf{x}_2\right)=\mathbf{w}^T\left(\mathbf{x}_1-\mathbf{x}_2\right)=0$
在这里插入图片描述

对于任意样本 $\mathbf{x}$ , 将其向决策面内投影, 并写成两个向量之和:
$\mathbf{x}=\mathbf{x}_p+r \frac{\mathbf{w}}{\|\mathbf{w}\|}$
其中, $\mathbf{x}_p$ 为 $\mathbf{x}$ 在超平面 $H$ 上的投影, $r$ 为点 $\mathbf{x}$ 到超平面 $H$ 的代数距离。如果 $\mathbf{x}$ 在超平面正侧, 则 $r > 0$ ; 反之 $r < 0$
由于 $g\left(\mathbf{x}_p\right)=0$ ，于是
$\begin{aligned} g(\mathbf{x}) & =\mathbf{w}^T\left(\mathbf{x}_p+r \frac{\mathbf{w}}{\|\mathbf{w}\|}\right)+w_0 \\ & =r\|\mathbf{w}\| \\ \Rightarrow r & =\frac{g(\mathbf{x})}{\|\mathbf{w}\|} \end{aligned}$
并且有坐标原点到超平面的距离为： $\mathcal{w}_0 /\|\mathbf{w}\|$ （用点到平面距离公式）
对于多分类问题，可以采用多个二类分类器集成得到多类分类器 $g_i(\mathbf{x})=\mathbf{w}_i^T \mathbf{x}+w_{i 0}, \quad i=1,2, \ldots, k$
One-vs-all: 逐一与所有的其它类进行配对, 可以构造 $c$ 个两类分类器。存在很多不确定区域，并且训练每个二类分类器要用到所有样本点
- One-vs-one：两两 (类-类) 配对, 可以构造 $c (c - 1) /2$ 个两类分类器。仍然存在不确定性区域，但相对较少；需要训练很多分类器，但训练起来只需要部分（两类）数据，而且更容易得到线性可分的结果（想象一下one vs all，在训练的时候，一类和剩下的所有类很可能是线性不可分的）
如果我们修改决策规则为
$\mathbf{x} \in \omega_i, \quad g_i(\mathbf{x})=\max _{j=1,2 . . c} g_j(\mathbf{x})$
将不再有不确定区域，最终的决策边界会发生改变

非线性判别函数

线性情形
$g(\mathbf{x})=w_0+\sum_{i=1}^d w_i x_i, \quad \text { 其中, } \mathbf{x}=\left[x_1, x_2, \ldots, x_d\right]^T$
可以进行二次推广，但是看成是线性函数（广义）
$\begin{aligned} g(\mathbf{x}) & =w_0+\sum_{i=1}^d w_i x_i+\sum_{i=1}^d \sum_{j=1}^d w_{i j} x_i x_j =\sum_{i=1}^{\hat{d}} a_i y_i(\mathbf{x}) \end{aligned}$
$\begin{aligned} & y_1(\mathbf{x})=1 \\ & y_2(\mathbf{x})=x_1 \\ & y_3(\mathbf{x})=x_2 \\ & \ldots \\ & y_{d+1}(\mathbf{x})=x_d \\ & y_{d+2}(\mathbf{x})=x_1^2 \\ & y_{d+3}(\mathbf{x})=x_1 x_2 \\ & \ldots \\ & y_{\frac{(d+1)(d+2)}{2}}(\mathbf{x})=x_d^2 \end{aligned}$
由于 $w_{i j}=w_{j i}$ ，共有 $1+d+d+(d^2-d)/2=(d+1)(d+2)/2$ 个系数待估计； $g(\mathbf{x})=0$ 为决策面, 它是一个二次超曲面
一般情况
$g(\mathbf{x})=\sum_{i=1}^{\hat{d}} a_i y_i(\mathbf{x})$

$\mathbf{a}$ 为广义权重向量, $\mathbf{y}$ 是经由 $\mathbf{x}$ 所变成的新数据点。
广义判别函数 $g(\mathbf{x})$ 对 $\mathbf{x}$ 而言是非线性的, 对 $\mathbf{y}$ 是线性的。
$g(\mathbf{x})$ 对 $\mathbf{y}$ 是齐次的, 意味着决策面通过新空间的坐标原点。且任意点 $\mathbf{y}$ 到决策面的代数距离为 $\mathbf{a}^T \mathbf{y} /\|\mathbf{a}\|$ （点到权重向量的投影长度）。
当新空间的维数足够高时, $g(\mathbf{x})$ 可以逼近任意判别函数。
但是, 新空间的维数远远高于原始空间的维数 $d$ 时, 会造成维数灾难问题（curse of dimensionality）。

【例子】设有一维样本空间 $\mathrm{X}$ , 我们期望如果 $x < - 1$ 或者 $x > 0.5$ , 则 $x$ 属于第一类 $\omega_1$ ; 如果 $- 1 < x < 0.5$ , 则属于第二类 $\omega_2$ , 请设计一个判别函数 $g (x)$ 。
在这里插入图片描述
$\begin{aligned} g(x) & =(x-0.5)(x+1) =-0.5+0.5 x+x^2 =a_1+a_2 x+a_3 x^2 \end{aligned}$

对线性判别函数采用齐次增广表示
$\mathbf{y}=\left(\begin{array}{l} 1 \\ \mathbf{x} \end{array}\right)=\left[\begin{array}{llll} 1 & x_1 & \cdots & x_d \end{array}\right]^T, \quad \mathbf{a}=\left(\begin{array}{l} w_0 \\ \mathbf{w} \end{array}\right)=\left[\begin{array}{llll} w_0 & w_1 & \cdots & w_d \end{array}\right]^T$
$g(\mathbf{x})=\mathbf{w}^T \mathbf{x}+w_0=\mathbf{a}^T \mathbf{y}$
Y空间中任意一点 $\mathbf{y}$ 到 $H$ 的距离为: $\quad r=\frac{g(\mathbf{x})}{\|\mathbf{a}\|}=\frac{\mathbf{a}^T \mathbf{y}}{\|\mathbf{a}\|}$
线性齐次空间增加了一个维度, 仍可保持欧氏距离不变, 分类效果与原来的决策面相同。但分类面将过坐标原点, 对于某些分析, 将具有优势。