引入
SVM解决了分类问题,而用类似方法解决回归问题的模型称为支持向量回归。目标是得到一个模型,使输出的 f ( x ⃗ ) f(\vec{x}) f(x)与 y y y尽可能接近。
传统的回归模型直接计算
f
(
x
⃗
)
f(\vec{x})
f(x)与
y
y
y的差距作为损失,当两者完全相等时损失为0;而SVR加入了支持向量,使得模型能够容忍
ε
\varepsilon
ε的偏差,即在距离
f
(
x
)
f(x)
f(x)不超过
ε
\varepsilon
ε的样本被认为预测正确,损失为0。
建立数学模型
根据上述,类似SVM,我们可以写出SVR的损失函数和问题转化:
m i n w ⃗ , b 1 2 ∣ ∣ w ⃗ ∣ ∣ 2 + C ∑ i = 1 m ℓ ε ( f ( x ⃗ i ) − y i ) min_{\vec{w}, b}\ \frac{1}{2}||\vec{w}||^{2}+C\sum_{i=1}^{m}\ell _{\varepsilon}(f(\vec{x}_{i})-y_{i}) minw,b 21∣∣w∣∣2+C∑i=1mℓε(f(xi)−yi)
其中 ℓ ε ( z ) = { 0 , i f ∣ z ∣ ≥ ε ; ∣ z ∣ − ε , o t h e r w i s e . \ell _{\varepsilon}(z)=\begin{cases}0,\ if \ |z| \ge \varepsilon; \\|z|-\varepsilon, \ otherwise.\end{cases} ℓε(z)={0, if ∣z∣≥ε;∣z∣−ε, otherwise.称为 ε − \varepsilon- ε−不敏感损失函数( ε \varepsilon ε-insensitive loss)
接下来就是经典的拉格朗日法处理二次规划问题。引入松弛变量 ξ i \xi_{i} ξi和 ξ ^ i \hat{\xi}_{i} ξ^i:
m i n w ⃗ , b 1 2 ∣ ∣ w ⃗ ∣ ∣ 2 + C ∑ i = 1 m ( ξ i + ξ ^ i ) min_{\vec{w}, b}\ \frac{1}{2}||\vec{w}||^{2}+C\sum_{i=1}^{m}(\xi_{i}+\hat{\xi}_{i}) minw,b 21∣∣w∣∣2+C∑i=1m(ξi+ξ^i)
s . t . { f ( x ⃗ i ) − y i ≤ ε + ξ i ; y i − f ( x ⃗ i ) ≤ ε + ξ i ; ξ i ≥ 0 , ξ ^ i ≥ 0 , i = 1 , 2 , . . . , m . s.t.\ \begin{cases}f(\vec{x}_{i})-y_{i} \le \varepsilon+\xi_{i}; \\ y_{i}-f(\vec{x}_{i}) \le \varepsilon+\xi_{i};\\\xi_{i} \ge 0, \hat{\xi}_{i} \ge 0,\ \ i=1,2,..., m.\end{cases} s.t. ⎩ ⎨ ⎧f(xi)−yi≤ε+ξi;yi−f(xi)≤ε+ξi;ξi≥0,ξ^i≥0, i=1,2,...,m.
这里使用双松弛变量,可以更好地处理边界误差,因为误差不一定是对称的。
接下来引入拉格朗日乘子得到拉格朗日函数:
μ i ≥ 0 , μ ^ i ≥ 0 , α i ≥ 0 , α ^ i ≥ 0 \mu_{i} \ge 0,\hat{\mu}_{i} \ge 0,\alpha_{i} \ge 0, \hat{\alpha}_{i} \ge 0 μi≥0,μ^i≥0,αi≥0,α^i≥0
L ( w ⃗ , b , α , α ^ , ξ , ξ ^ , μ , μ ^ ) L(\vec{w},b,\alpha,\hat{\alpha},\xi,\hat{\xi},\mu,\hat{\mu}) L(w,b,α,α^,ξ,ξ^,μ,μ^)
= 1 2 ∣ ∣ w ⃗ ∣ ∣ 2 + C ∑ i = 1 m ( ξ i + ξ ^ i ) − ∑ i = 1 m μ i ξ i − ∑ i = 1 m μ ^ i ξ ^ i =\frac{1}{2}||\vec{w}||^{2}+C\sum_{i=1}^{m}(\xi_{i}+\hat{\xi}_{i})-\sum_{i=1}^{m}\mu_{i}\xi_{i}-\sum_{i=1}^{m}\hat{\mu}_{i}\hat{\xi}_{i} =21∣∣w∣∣2+C∑i=1m(ξi+ξ^i)−∑i=1mμiξi−∑i=1mμ^iξ^i
+ ∑ i = 1 m α i ( f ( x ⃗ i ) − y i − ε − ξ i ) + ∑ i = 1 m α i ( y i − f ( x ⃗ i ) − ε − ξ ^ i ) +\sum_{i=1}^{m}\alpha_{i}(f(\vec{x}_{i})-y_{i}-\varepsilon-\xi_{i})+\sum_{i=1}^{m}\alpha_{i}(y_{i}-f(\vec{x}_{i})-\varepsilon-\hat{\xi}_{i}) +∑i=1mαi(f(xi)−yi−ε−ξi)+∑i=1mαi(yi−f(xi)−ε−ξ^i)
令 L L L对 w ⃗ , b , ξ i , ξ ^ i \vec{w},b,\xi_{i},\hat{\xi}_{i} w,b,ξi,ξ^i偏导为0得:
w ⃗ = ∑ i = 1 m ( α ^ i − α i ) x ⃗ i \vec{w}=\sum_{i=1}^{m}(\hat{\alpha}_{i}-\alpha_{i})\vec{x}_{i} w=∑i=1m(α^i−αi)xi
0 = ∑ i = 1 m ( α ^ i − α i ) 0=\sum_{i=1}^{m}(\hat{\alpha}_{i}-\alpha_{i}) 0=∑i=1m(α^i−αi)
C = α i + μ i = α ^ i + μ ^ i C=\alpha_{i}+\mu_{i}=\hat{\alpha}_{i}+\hat{\mu}_{i} C=αi+μi=α^i+μ^i
代回得到对偶问题:
m a x α , α ^ ∑ i = 1 m y i ( α ^ i − α i ) − ε ( α ^ i + α i ) max_{\alpha, \hat{\alpha}}\sum_{i=1}^{m}y_{i}(\hat{\alpha}_{i}-\alpha_{i})-\varepsilon(\hat{\alpha}_{i}+\alpha_{i}) maxα,α^∑i=1myi(α^i−αi)−ε(α^i+αi)
− 1 2 ∑ i = 1 m ∑ j = 1 m ( α ^ i − α i ) ( α ^ j − α j ) x ⃗ i T x ⃗ j \ \ \ \ \ \ \ \ \ \ \ \ \ \ -\frac{1}{2}\sum_{i=1}{m}\sum_{j=1}^{m}(\hat{\alpha}_{i}-\alpha_{i})(\hat{\alpha}_{j}-\alpha_{j})\vec{x}_{i}^{T}\vec{x}_{j} −21∑i=1m∑j=1m(α^i−αi)(α^j−αj)xiTxj
s . t . ∑ i = 1 m ( α ^ i − α i ) = 0 , s.t. \sum_{i=1}^{m}(\hat{\alpha}_{i}-\alpha_{i})=0, s.t.∑i=1m(α^i−αi)=0,
0 ≤ α i , α ^ i ≤ C . \ \ \ \ \ \ 0 \le \alpha_{i}, \hat{\alpha}_{i} \le C. 0≤αi,α^i≤C.
KKT条件:
{ α i ( f ( x ⃗ i ) − y i − ε − ξ i ) = 0 , α ^ i ( y i − f ( x ⃗ i ) − ε − ξ ^ i ) = 0 , α i α ^ i = 0 , ξ i ξ ^ i = 0 , ( C − α i ) ξ i = 0 , ( C − α ^ i ) ξ ^ i = 0. \begin{cases} \alpha_{i}(f(\vec{x}_{i})-y_{i}-\varepsilon -\xi_{i})=0, \\\hat{\alpha} _{i}(y_{i}-f(\vec{x}_{i})-\varepsilon -\hat{\xi} _{i})=0, \\\alpha_{i}\hat{\alpha}_{i}=0, \\\xi_{i}\hat{\xi}_{i}=0, \\(C-\alpha_{i})\xi_{i}=0, \\(C-\hat{\alpha} _{i})\hat{\xi} _{i}=0. \end{cases} ⎩ ⎨ ⎧αi(f(xi)−yi−ε−ξi)=0,α^i(yi−f(xi)−ε−ξ^i)=0,αiα^i=0,ξiξ^i=0,(C−αi)ξi=0,(C−α^i)ξ^i=0.
有 α i \alpha_{i} αi与 f ( x ⃗ i ) − y i − ε − ξ i f(\vec{x}_{i})-y_{i}-\varepsilon -\xi_{i} f(xi)−yi−ε−ξi不能同时非零; α ^ i \hat{\alpha} _{i} α^i和 y i − f ( x ⃗ i ) − ε − ξ ^ i y_{i}-f(\vec{x}_{i})-\varepsilon -\hat{\xi} _{i} yi−f(xi)−ε−ξ^i不能同时非零。
而 f ( x ⃗ i ) − y i − ε − ξ i f(\vec{x}_{i})-y_{i}-\varepsilon -\xi_{i} f(xi)−yi−ε−ξi与 y i − f ( x ⃗ i ) − ε − ξ ^ i y_{i}-f(\vec{x}_{i})-\varepsilon -\hat{\xi} _{i} yi−f(xi)−ε−ξ^i不可能同时为0(支持向量的意义)。所以 α i \alpha_{i} αi和 α ^ i \hat{\alpha}_{i} α^i至少有一个为 0 0 0。
代回后解得:
f ( x ) = ∑ i = 1 m ( α ^ i − α i ) x ⃗ i T x ⃗ + b f(x)=\sum_{i=1}^{m}(\hat{\alpha}_{i}-\alpha_{i})\vec{x}_{i}^{T}\vec{x}+b f(x)=∑i=1m(α^i−αi)xiTx+b
其中,一定存在 i i i,使得 0 < α i < C 0<\alpha_{i}<C 0<αi<C从而使 ξ i = 0 \xi_{i}=0 ξi=0,进而推出 b b b:
b = y i + ε − ∑ j = 1 m ( α ^ j − α j ) x ⃗ j T x ⃗ i b=y_{i}+\varepsilon-\sum_{j=1}^{m}(\hat{\alpha}_{j}-\alpha_{j})\vec{x}_{j}^{T}\vec{x}_{i} b=yi+ε−∑j=1m(α^j−αj)xjTxi
加上特征映射与核函数,有:
f ( x ) = ∑ i = 1 m ( α ^ i − α i ) κ ( x ⃗ i , x ⃗ ) + b f(x)=\sum_{i=1}^{m}(\hat{\alpha}_{i}-\alpha_{i})\kappa(\vec{x}_{i},\vec{x})+b f(x)=∑i=1m(α^i−αi)κ(xi,x)+b