Problem Set 2

news2025/4/5 6:01:08

Regularized Normal Equation for Linear Re-gression Given a data set
{ar(), y()}i=1,.-.,m with x()∈ R" and g(∈ R, the generalform of
regularized linear regression is as follows n (he(zr)- g)3+入>0号 (1) ”
2m i=1 j=1 Derive the normal equation.

在这里插入图片描述
设

$X=\begin{bmatrix} (x^{(1)})^T \\ (x^{(2)})^T \\ ... \\ (x^{(m)})^T \end{bmatrix}$ ， $Y=\begin{bmatrix} y^{(1)} \\ y^{(2)} \\ ... \\ y^{(m)} \end{bmatrix}$ ，

因此， $\theta-Y=\begin{bmatrix} (x^{(1)})^T\theta \\ (x^{(2)})^T\theta \\ ... \\ (x^{(m)})^T\theta \end{bmatrix}-\begin{bmatrix} y^{(1)} \\ y^{(2)} \\ ... \\ y^{(m)} \end{bmatrix}=\begin{bmatrix} h_{\theta}(x^{(1)})-y^{(1)} \\ h_{\theta}(x^{(2)})-y^{(2)} \\ ... \\ h_{\theta}(x^{(m)})-y^{(m)} \end{bmatrix}$ ，

损失函数可以表达为 $J(\theta)=\frac{1}{2m}[(X \theta-Y)^T(X \theta-Y)+\lambda\theta^T\theta]$ ，

$\nabla_{\theta}J(\theta)=\nabla_{\theta}\frac{1}{2m}[(X \theta-Y)^T(X \theta-Y)+\lambda\theta^T\theta]$
$=\frac{1}{2m}[\nabla_{\theta}(X \theta-Y)^T(X \theta-Y)+\nabla_{\theta}\lambda\theta^T\theta]$

$\nabla_{\theta}\lambda\theta^T\theta=\lambda\nabla_{\theta}\theta^T\theta=\lambda\nabla_{\theta}tr(\theta\theta^T)=\lambda L\theta$

因此， $\nabla_{\theta}J(\theta)=\frac{1}{2m}(X^TX\theta-X^TY+\lambda L\theta)$

令 $\nabla_{\theta}J(\theta)=0$ ，当 $X$ 矩阵各列向量线性独立时， $X^TX$ 矩阵可逆，存在唯一解 $\theta=(X^TX+\lambda L)^{-1}X^TY$ .

aussian Discriminant Analysis Model Given m training data {x() ,
g)}i=1,… ,m，assume that y ~ Bernoulli(b)，ay =0～N(uo,2),x \ y = 1
～N(u1,>).Hence, we have p(y)= ”(1 一 )1一u .p(zl y =0)=(2z7)"/72 3(1/
exp(一士(a一 uo)T>-1(a 一o))op(al y= 1)=(2n)n/l2(1/a exp (一是(a
一u1)TE-1(a一ui))The log-likelihood function is m l(, /Lo,41,>)= log ][
[p(r(), g); o, uo,41,) i二1 m
=logp(x()| g() ; ，uo,41,2)p(g() ; ) i—1 Solve p，o，u1 and 2 by maximizing l(, Lo，u1，>). Hint: xtr(AX-1B)=一(X-1BAX-1)T，VA|A=|A|(A-1)T

在这里插入图片描述
这里高斯判别分析（GDA）公式推导

3MLE for Naive Bayes Consider the following definition of MLE problem
for multinomials. Theinput to the problem is a finite set J,and a
weight cg > 0 for each gy ∈ y. The output from the problem is the
distribution p* that solves the followingmaximization problem. p*= arg
max > c y log py y∈ (i) Prove that,the vector p* has components p,-Cy
for Vy ∈ y,where N = >ucycy.(Hint: Use the theory of
Lagrangemultiplier) (1i) Using the above consequence，prove that,the
maximum-likelihood esti- mates for Naive Bayes model are as follows
p)=之1 1(y()=gy) m and Ps(a l y)=>E1 1(g=y Aa,=z) 〉岩11(g(阈)= g)

在这里插入图片描述

(i)设拉格朗日函数为 $L(\Omega,\alpha)=\sum_{y\in Y}c_ylogp_y-\alpha(\sum_{y\in Y}p_y-1)$ ，其中 $\alpha$ 为拉格朗日乘子，

对 $p_y$ 求偏导，令 $\frac{\partial}{\partial p_y}L(\Omega,\alpha)=0$ ，

求得 $p_y^{*}=\frac{c_y}{\alpha}$ ，代入 $\sum_{y\in Y} p_y^{*}=1$ 得 $\frac{\sum_{y\in Y}c_y}{\alpha}=1$ ，

而 $N=\sum_{y\in Y}c_y$ ，因此 $\alpha=N$ ，进而 $p_y^{*}=\frac{c_y}{N}$

(ii)贝叶斯的最大似然模型的目标函数为

$max\ {\sum^{m}_{i=1}logp(y^{(i)})}+\sum^{m}_{i=1}\sum^{n}_{j=1}logp_j(x_j^{(i)}|y^{(i)})$

设标签种类数为 $k$ ，则 $p (y)$ 满足约束 $\sum^k_{i=1} p(y)=1$ ，以及 $p(x_{j}|y)$ 满足约束 $\sum^n_{j=1} p(x_{j}|y)=1$ ，且所有概率均是非负的。

注意到加号两边可以分开独立进行优化，对于加号左边考虑优化模型：

$max\ {\sum^{m}_{i=1}logp(y^{(i)})}$

$\sum^k_{i=1} p(y)=1$

将标签 $y$ 在训练集中的出现次数 $c n t (y)$ 视为权重 $c_y$ ，其中 $cnt(y)=\sum^m_{i=1}1(y^{(i)}=y)$ ，因此

$max\ {\sum^{m}_{i=1}logp(y^{(i)})}=max\ {\sum^{k}_{i=1}cnt(y)logp(y)}$ ，根据第一问的结论有 $p^*(y)=\frac{cnt(y)}{m}=\frac{\sum^m_{i=1}1(y^{(i)}=y)}{m}$ .

同理，将特征 $x_j$ 在训练集标签为 $y$ 的样本中的出现次数 $cnt(x_j|y)$ 视为权重 $c_y$ ，其中 $cnt(x_j|y)=\sum^m_{i=1}1(y^{(i)}=y \land x_j^{(i)}=x)$ ，因此

$max\ \sum^{m}_{i=1}\sum^{n}_{j=1}logp_j(x_j^{(i)}|y^{(i)})\\=max\ \sum^{n}_{j=1}\sum^{m}_{i=1}logp_j(x_j^{(i)}|y^{(i)})\\=max \sum^{n}_{j=1}cnt(x_j|y)logp_j(x_j|y)$