【推导结果】如何得到回归均方误差估计系数的标准误

news2024/11/14 22:00:30

对线性回归模型系数标准差标准误的理解

1.生成数据

y	x	e
3.6	1	0.63
3.4	2	-1.38
7.6	3	1.01
7.4	4	-1.01
11.6	5	1.38
11.4	6	-0.63

在这里插入图片描述

2.回归

$\beta_{0}+\beta_{1}x+\epsilon$

$y_{i}=\beta_{0}+\beta_{1} x_{i}+e_{i}$

reg y x

      Source |       SS           df       MS      Number of obs   =         6
-------------+----------------------------------   F(1, 4)         =     34.60
       Model |   57.422285         1   57.422285   Prob > F        =    0.0042
    Residual |  6.63771505         4  1.65942876   R-squared       =    0.8964
-------------+----------------------------------   Adj R-squared   =    0.8705
       Total |  64.0600001         5      12.812   Root MSE        =    1.2882

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   1.811429   .3079359     5.88   0.004     .9564615    2.666396
       _cons |       1.16   1.199238     0.97   0.388    -2.169618    4.489618
------------------------------------------------------------------------------

3.计算回归的标准误差

（1）SSE\SSR\SST

$SSE$ : Sum of Squares Error,
$\sum_{i=1}^{n}(\hat{y_{i}}-y_{i})^2 = \sum_{i=1}^{n}(e_{i}-\bar{e})^2$
在本示例中， $SSE=(3.6-2.97)^2+(3.4-4.78)^2+(7.6-6.95)^2+(7.4-8.41)^2+(11.6-10.22)^2+(11.4-12.03)^2 = 6.637713$

$SSR$ : Sum of Squares of the Regression
$\sum_{i=1}^{n}(\hat{y_{i}}-\bar{y})^2$
$SST$ : Total Sum of Squares
$\sum_{i=1}^{n}(y_{i}-\bar{y})^2$

（2）MSE

回归的标准误差为：
$s^{2}=MSE=\frac{SSE}{n-K}=\frac{\sum_{i=1}^{n}(e_{i}-\bar{e})^2}{n-K}$

$s=\sqrt{MSE}$

$s^2 = \frac{6.637713}{6 - 2}=1.6594282; \ \ \ \ \ \ \ s=1.288188$

（3）SE

$S_{\hat{\beta}} = \sqrt{\frac{\frac{1}{n-2}\sum_{i=1}^{n} \hat{e^{2}}}{{\sum_{i=1}^{n}(x_{i}-\bar{x})}}}$

$S_{\hat{\beta}} = \sqrt{\frac{\frac{1}{4} \times 6.637713}{(1-3.5)^2+(2-3.5)^2+(3-3.5)^2+(4-3.5)^2+(5-3.5)^2+(6-3.5)^2}}$

SE为何会很大？

样本少，分母可能大
极端值多
X分布散（X距X均值离差太大）

Appendix

1. simulation code

clear 
set obs 6
gen y = 3.6 in 1 
replace y = 3.4 in 2 
replace y = 7.6 in 3
replace y = 7.4 in 4
replace y = 11.6 in 5
replace y = 11.4 in 6
gen x = _n

reg y x
predict xb

gen e = y - xb
format %9.2f xb 
format %9.2f e 
egen addtext_mean = rowmean(y xb)
forv i = 1/6{
	su add in `i',d
	global y`i' = r(mean)
	su e in `i',d
	global e`i' = r(mean)
}

tw (scatter y x, mlab(y) mlabp(1)) /// 
   (lfit y x) /// 
   (scatter xb x, mlab(xb) mlabp(1)) /// 
   (rspike y xb x) ,legend(off) /// 
   text($y1 0.9 "0.63",size(vsmall) color(red)) /// 
   text($y2 1.9 "-1.38",size(vsmall) color(red)) /// 
   text($y3 2.9 "1.01",size(vsmall) color(red)) /// 
   text($y4 3.9 "-1.01",size(vsmall) color(red)) /// 
   text($y5 4.9 "1.38",size(vsmall) color(red)) /// 
   text($y6 5.9 "-0.63",size(vsmall) color(red))