LDA(Fisher)线性判别分析
对于二分类问题若存在一个 y i = W x i y_i=Wx_i yi=Wxi将样本 X \pmb X X投影到一维空间上
为了使两个样本能够较好的分开,应该是的每一个同类的样本的方差(离散程度)尽可能的小,而不同类的样本的尽可能的远
设样本可以分为 w 1 w_1 w1与 w 2 w_2 w2两类
则我们可以计算
各类样本的类内均值向量
m
i
=
1
N
i
∑
x
∈
w
i
x
m
i
ˉ
=
1
N
i
∑
y
∈
w
i
y
m_i=\frac{1}{N_i}\sum_{x \in w_i}x\\ \bar{m_i}=\frac{1}{N_i}\sum_{y \in w_i}y
mi=Ni1x∈wi∑xmiˉ=Ni1y∈wi∑y
各类样本的类内离散度矩阵
S
i
=
∑
x
∈
w
i
(
x
−
m
i
)
(
x
−
m
i
)
T
S
i
ˉ
=
∑
y
∈
w
i
(
y
−
m
i
ˉ
)
2
S_i=\sum_{x \in w_i}(x-m_i){(x-m_i)}^T\\ \bar{S_i}=\sum_{y \in w_i}{(y-\bar{m_i})}^2
Si=x∈wi∑(x−mi)(x−mi)TSiˉ=y∈wi∑(y−miˉ)2
总体样本的类内离散度矩阵
S
w
=
S
1
+
S
2
S
w
ˉ
=
S
1
ˉ
+
S
2
ˉ
S_w=S_1+S_2\\ \bar{S_w}=\bar{S_1}+\bar{S_2}
Sw=S1+S2Swˉ=S1ˉ+S2ˉ
样本的类间离散度矩阵
S
b
=
(
m
1
−
m
2
)
(
m
1
−
m
2
)
T
S
b
ˉ
=
(
m
1
ˉ
−
m
2
ˉ
)
(
m
1
ˉ
−
m
2
ˉ
)
T
S_b=(m_1-m_2){(m_1-m_2)}^T\\ \bar{S_b}=(\bar{m_1}-\bar{m_2}){(\bar{m_1}-\bar{m_2})}^T
Sb=(m1−m2)(m1−m2)TSbˉ=(m1ˉ−m2ˉ)(m1ˉ−m2ˉ)T
Fisher准则函数
J
F
(
W
)
=
(
m
1
ˉ
−
m
2
ˉ
)
2
S
1
ˉ
+
S
2
ˉ
J_F(W)=\frac{{(\bar{m_1}-\bar{m_2})}^2}{\bar{S_1}+\bar{S_2}}
JF(W)=S1ˉ+S2ˉ(m1ˉ−m2ˉ)2
由此我们优化的目标时使得
J
F
(
W
)
J_F(W)
JF(W)最大
(
m
1
ˉ
−
m
2
ˉ
)
2
=
W
(
m
1
−
m
2
)
(
m
1
−
m
2
)
T
W
T
=
W
S
b
W
T
\begin{aligned}{ (\bar{m_1}-\bar{m_2})}^2&=W(m_1-m_2)(m_1-m_2)^TW^T\\ &=WS_bW^T\\ \end{aligned}
(m1ˉ−m2ˉ)2=W(m1−m2)(m1−m2)TWT=WSbWT
S i ˉ = ∑ y ∈ w i ( y − m i ˉ ) 2 = ∑ x ∈ w i ( W x − W m i ) 2 = ∑ x ∈ w i W ( x − m i ) ( x − m i ) W T = W S i W T \begin{aligned} \bar{S_i}&=\sum_{y \in w_i}{(y-\bar{m_i})}^2\\ &=\sum_{x \in w_i}(Wx-Wm_i)^2\\ &=\sum_{x \in w_i}W(x-m_i)(x-m_i)W^T\\ &=WS_iW^T \end{aligned} Siˉ=y∈wi∑(y−miˉ)2=x∈wi∑(Wx−Wmi)2=x∈wi∑W(x−mi)(x−mi)WT=WSiWT
J F ( W ) = ( m 1 ˉ − m 2 ˉ ) 2 S 1 ˉ + S 2 ˉ = W S b W T W ( S 1 + S 2 ) W T = W S b W T W S w W T \begin{aligned} J_F(W)&=\frac{{(\bar{m_1}-\bar{m_2})}^2}{\bar{S_1}+\bar{S_2}}\\ &=\frac{WS_bW^T}{W(S_1+S_2)W^T}\\ &=\frac{WS_bW^T}{WS_wW^T} \end{aligned} JF(W)=S1ˉ+S2ˉ(m1ˉ−m2ˉ)2=W(S1+S2)WTWSbWT=WSwWTWSbWT
采用拉格朗日乘数法
L
(
W
,
λ
)
=
W
S
b
W
T
−
λ
(
W
S
w
W
T
−
c
)
L(W,\lambda)=WS_bW^T-\lambda(WS_wW^T-c)
L(W,λ)=WSbWT−λ(WSwWT−c)
对
W
W
W求偏导数
∂
L
(
W
,
λ
)
∂
W
=
S
b
W
−
λ
S
w
W
\begin{aligned} \frac{\partial L(W,\lambda)}{\partial W}=S_bW-\lambda S_wW \end{aligned}
∂W∂L(W,λ)=SbW−λSwW
偏导数为0
∂
L
(
W
,
λ
)
∂
W
=
S
b
W
−
λ
S
w
W
=
0
\begin{aligned} \frac{\partial L(W,\lambda)}{\partial W}=S_bW-\lambda S_wW=0 \end{aligned}
∂W∂L(W,λ)=SbW−λSwW=0
则存在
S
b
W
=
λ
S
w
W
S_bW=\lambda S_wW
SbW=λSwW
因为
S
w
S_w
Sw为非奇异矩阵可得到
S
w
−
1
S
b
W
=
λ
W
S_w^{-1}S_bW=\lambda W
Sw−1SbW=λW
可以视为求矩阵
S
w
−
1
S
b
S_w^{-1}S_b
Sw−1Sb的特征向量
S
w
−
1
S
b
W
=
S
w
−
1
(
m
1
−
m
2
)
(
m
1
−
m
2
)
T
W
S_w^{-1}S_bW=S_w^{-1}(m_1-m_2)(m_1-m_2)^TW
Sw−1SbW=Sw−1(m1−m2)(m1−m2)TW
(
m
1
−
m
2
)
T
W
(m_1-m_2)^TW
(m1−m2)TW为一个标量设为R,则
λ
W
=
S
w
−
1
(
m
1
−
m
2
)
R
\lambda W=S_w^{-1}(m_1-m_2)R
λW=Sw−1(m1−m2)R
于是
W
=
R
λ
S
w
−
1
(
m
1
−
m
2
)
W=\frac{R}{\lambda}S_w^{-1}(m_1-m_2)
W=λRSw−1(m1−m2)
由于寻找对是W的方向上的向量,所以
W
∗
=
S
w
−
1
(
m
1
−
m
2
)
W^*=S_w^{-1}(m_1-m_2)
W∗=Sw−1(m1−m2)
综上所述,存在
W
∗
W^*
W∗使得LDA可以较好的解决二分类问题。