对线性回归模型系数标准差标准误的理解
1.生成数据
y | x | e |
---|---|---|
3.6 | 1 | 0.63 |
3.4 | 2 | -1.38 |
7.6 | 3 | 1.01 |
7.4 | 4 | -1.01 |
11.6 | 5 | 1.38 |
11.4 | 6 | -0.63 |
2.回归
y = β 0 + β 1 x + ϵ y = \beta_{0}+\beta_{1}x+\epsilon y=β0+β1x+ϵ
y i = β 0 + β 1 x i + e i y_{i}=\beta_{0}+\beta_{1} x_{i}+e_{i} yi=β0+β1xi+ei
reg y x
Source | SS df MS Number of obs = 6
-------------+---------------------------------- F(1, 4) = 34.60
Model | 57.422285 1 57.422285 Prob > F = 0.0042
Residual | 6.63771505 4 1.65942876 R-squared = 0.8964
-------------+---------------------------------- Adj R-squared = 0.8705
Total | 64.0600001 5 12.812 Root MSE = 1.2882
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x | 1.811429 .3079359 5.88 0.004 .9564615 2.666396
_cons | 1.16 1.199238 0.97 0.388 -2.169618 4.489618
------------------------------------------------------------------------------
3.计算回归的标准误差
(1)SSE\SSR\SST
S
S
E
SSE
SSE: Sum of Squares Error,
S
S
E
=
∑
i
=
1
n
(
y
i
^
−
y
i
)
2
=
∑
i
=
1
n
(
e
i
−
e
ˉ
)
2
SSE= \sum_{i=1}^{n}(\hat{y_{i}}-y_{i})^2 = \sum_{i=1}^{n}(e_{i}-\bar{e})^2
SSE=i=1∑n(yi^−yi)2=i=1∑n(ei−eˉ)2
在本示例中,
S
S
E
=
(
3.6
−
2.97
)
2
+
(
3.4
−
4.78
)
2
+
(
7.6
−
6.95
)
2
+
(
7.4
−
8.41
)
2
+
(
11.6
−
10.22
)
2
+
(
11.4
−
12.03
)
2
=
6.637713
SSE=(3.6-2.97)^2+(3.4-4.78)^2+(7.6-6.95)^2+(7.4-8.41)^2+(11.6-10.22)^2+(11.4-12.03)^2 = 6.637713
SSE=(3.6−2.97)2+(3.4−4.78)2+(7.6−6.95)2+(7.4−8.41)2+(11.6−10.22)2+(11.4−12.03)2=6.637713
S
S
R
SSR
SSR: Sum of Squares of the Regression
S
S
R
=
∑
i
=
1
n
(
y
i
^
−
y
ˉ
)
2
SSR= \sum_{i=1}^{n}(\hat{y_{i}}-\bar{y})^2
SSR=i=1∑n(yi^−yˉ)2
S
S
T
SST
SST: Total Sum of Squares
S
S
T
=
∑
i
=
1
n
(
y
i
−
y
ˉ
)
2
SST= \sum_{i=1}^{n}(y_{i}-\bar{y})^2
SST=i=1∑n(yi−yˉ)2
(2)MSE
回归的标准误差为:
s
2
=
M
S
E
=
S
S
E
n
−
K
=
∑
i
=
1
n
(
e
i
−
e
ˉ
)
2
n
−
K
s^{2}=MSE=\frac{SSE}{n-K}=\frac{\sum_{i=1}^{n}(e_{i}-\bar{e})^2}{n-K}
s2=MSE=n−KSSE=n−K∑i=1n(ei−eˉ)2
s = M S E s=\sqrt{MSE} s=MSE
s 2 = 6.637713 6 − 2 = 1.6594282 ; s = 1.288188 s^2 = \frac{6.637713}{6 - 2}=1.6594282; \ \ \ \ \ \ \ s=1.288188 s2=6−26.637713=1.6594282; s=1.288188
(3)SE
S β ^ = 1 n − 2 ∑ i = 1 n e 2 ^ ∑ i = 1 n ( x i − x ˉ ) S_{\hat{\beta}} = \sqrt{\frac{\frac{1}{n-2}\sum_{i=1}^{n} \hat{e^{2}}}{{\sum_{i=1}^{n}(x_{i}-\bar{x})}}} Sβ^=∑i=1n(xi−xˉ)n−21∑i=1ne2^
S β ^ = 1 4 × 6.637713 ( 1 − 3.5 ) 2 + ( 2 − 3.5 ) 2 + ( 3 − 3.5 ) 2 + ( 4 − 3.5 ) 2 + ( 5 − 3.5 ) 2 + ( 6 − 3.5 ) 2 S_{\hat{\beta}} = \sqrt{\frac{\frac{1}{4} \times 6.637713}{(1-3.5)^2+(2-3.5)^2+(3-3.5)^2+(4-3.5)^2+(5-3.5)^2+(6-3.5)^2}} Sβ^=(1−3.5)2+(2−3.5)2+(3−3.5)2+(4−3.5)2+(5−3.5)2+(6−3.5)241×6.637713
SE为何会很大?
- 样本少,分母可能大
- 极端值多
- X分布散(X距X均值离差太大)
Appendix
1. simulation code
clear
set obs 6
gen y = 3.6 in 1
replace y = 3.4 in 2
replace y = 7.6 in 3
replace y = 7.4 in 4
replace y = 11.6 in 5
replace y = 11.4 in 6
gen x = _n
reg y x
predict xb
gen e = y - xb
format %9.2f xb
format %9.2f e
egen addtext_mean = rowmean(y xb)
forv i = 1/6{
su add in `i',d
global y`i' = r(mean)
su e in `i',d
global e`i' = r(mean)
}
tw (scatter y x, mlab(y) mlabp(1)) ///
(lfit y x) ///
(scatter xb x, mlab(xb) mlabp(1)) ///
(rspike y xb x) ,legend(off) ///
text($y1 0.9 "0.63",size(vsmall) color(red)) ///
text($y2 1.9 "-1.38",size(vsmall) color(red)) ///
text($y3 2.9 "1.01",size(vsmall) color(red)) ///
text($y4 3.9 "-1.01",size(vsmall) color(red)) ///
text($y5 4.9 "1.38",size(vsmall) color(red)) ///
text($y6 5.9 "-0.63",size(vsmall) color(red))
2.序列相关 同方差 or 异方差
对于①参数线性②不存在“严格多重共线性”③随机抽样④严格外生性⑤“球形扰动项”(条件同方差+不存在自相关)五个假定均能够满足时
OLS估计量为BLUE,最优无偏线性估计量
此时,x的协方差矩阵为:
V
a
r
(
β
1
^
∣
x
)
=
V
a
r
(
β
1
+
∑
(
x
i
−
x
ˉ
)
e
i
∑
(
x
i
−
x
ˉ
)
∣
x
)
Var(\hat{\beta_{1}}|x)=Var({\beta_{1}+\frac{\sum(x_{i}-\bar{x})e_{i}}{\sum(x_{i}-\bar{x})}}|x)
Var(β1^∣x)=Var(β1+∑(xi−xˉ)∑(xi−xˉ)ei∣x)
V a r ( β 1 ^ ∣ x ) = V a r ( ∑ ( x i − x ˉ ) e i ∣ x ) [ ∑ ( x i − x ˉ ) 2 ] 2 Var(\hat{\beta_{1}}|x)=\frac{Var(\sum(x_{i}-\bar{x})e_{i}|x)}{[\sum(x_{i}-\bar{x})^2]^2} Var(β1^∣x)=[∑(xi−xˉ)2]2Var(∑(xi−xˉ)ei∣x)
- 倘若序列无关,那么和的方差即等价于方差的和,假设 V a r ( e i ∣ x ) = σ 2 Var(e_i|x)=\sigma^2 Var(ei∣x)=σ2
KaTeX parse error: Unknown column alignment: * at position 71: … \begin{array}{*̲*lr**} …
- 序列相关:
σ 2 ^ = ∑ e i 2 n − k − 1 \hat{\sigma^2}=\frac{\sum e_{i}^2}{n-k-1} σ2^=n−k−1∑ei2
3.calculate SE in matlab
sqrt(inv(X'*X)*1.6594282)