参考资料:https://zhuanlan.zhihu.com/p/273595649
一、前向传播
1、第一层
(1)线性层
{ z 1 ( 1 ) = w 11 ( 1 ) ∗ x 1 + w 12 ( 1 ) ∗ x 2 + b 1 ( 1 ) z 2 ( 1 ) = w 21 ( 1 ) ∗ x 1 + w 22 ( 1 ) ∗ x 2 + b 2 ( 1 ) z 3 ( 1 ) = w 31 ( 1 ) ∗ x 1 + w 32 ( 1 ) ∗ x 2 + b 3 ( 1 ) \left\{ \begin{array}{c} z_1^{(1)}= w_{11}^{(1)}*x_1+w_{12}^{(1)}*x_2+b_1^{(1)}\\ z_2^{(1)}= w_{21}^{(1)}*x_1+w_{22}^{(1)}*x_2+b_2^{(1)} \\ z_3^{(1)}= w_{31}^{(1)}*x_1+w_{32}^{(1)}*x_2+b_3^{(1)} \end{array}\right. ⎩ ⎨ ⎧z1(1)=w11(1)∗x1+w12(1)∗x2+b1(1)z2(1)=w21(1)∗x1+w22(1)∗x2+b2(1)z3(1)=w31(1)∗x1+w32(1)∗x2+b3(1)
(2)非线性层
{ a 1 ( 1 ) = f ( z 1 ( 1 ) ) a 2 ( 1 ) = f ( z 2 ( 1 ) ) a 3 ( 1 ) = f ( z 3 ( 1 ) ) \left\{ \begin{array}{c} a_1^{(1)}= f(z_1^{(1)})\\ a_2^{(1)}= f(z_2^{(1)})\\ a_3^{(1)}= f(z_3^{(1)}) \end{array}\right. ⎩ ⎨ ⎧a1(1)=f(z1(1))a2(1)=f(z2(1))a3(1)=f(z3(1))
(3)矩阵化
X
(
1
)
=
[
x
1
x
2
]
X^{(1)}= \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}
X(1)=[x1x2]
Z
(
1
)
=
[
z
1
(
1
)
z
2
(
1
)
z
3
(
1
)
]
Z^{(1)}= \begin{bmatrix} z_1^{(1)} \\ z_2^{(1)} \\ z_3^{(1)} \end{bmatrix}
Z(1)=
z1(1)z2(1)z3(1)
W
(
1
)
=
[
w
11
(
1
)
w
12
(
1
)
w
21
(
1
)
w
22
(
1
)
]
W^{(1)}= \begin{bmatrix} w_{11}^{(1)} & w_{12}^{(1)}\\ w_{21}^{(1)} & w_{22}^{(1)} \\ \end{bmatrix}
W(1)=[w11(1)w21(1)w12(1)w22(1)]
B
(
1
)
=
[
b
1
(
1
)
b
2
(
1
)
b
3
(
1
)
]
B^{(1)}= \begin{bmatrix} b_1^{(1)} \\ b_2^{(1)} \\ b_3^{(1)} \end{bmatrix}
B(1)=
b1(1)b2(1)b3(1)
Z
(
1
)
=
W
(
1
)
∗
X
(
1
)
+
B
(
1
)
Z^{(1)}=W^{(1)}*X^{(1)}+B^{(1)}
Z(1)=W(1)∗X(1)+B(1)
2、第二层
{ z 1 ( 2 ) = w 11 ( 2 ) ∗ a 1 ( 1 ) + w 12 ( 2 ) ∗ a 2 ( 1 ) + w 13 ( 2 ) ∗ a 3 ( 1 ) + b 1 ( 2 ) z 2 ( 2 ) = w 21 ( 2 ) ∗ a 1 ( 1 ) + w 22 ( 2 ) ∗ a 2 ( 1 ) + w 23 ( 2 ) ∗ a 3 ( 1 ) + b 2 ( 2 ) z 3 ( 2 ) = w 31 ( 2 ) ∗ a 1 ( 1 ) + w 32 ( 2 ) ∗ a 2 ( 1 ) + w 33 ( 2 ) ∗ a 3 ( 1 ) + b 3 ( 2 ) \left\{ \begin{array}{c} z_1^{(2)}= w_{11}^{(2)}*a_1^{(1)}+w_{12}^{(2)}*a_2^{(1)}+w_{13}^{(2)}*a_3^{(1)}+b_1^{(2)}\\ z_2^{(2)}= w_{21}^{(2)}*a_1^{(1)}+w_{22}^{(2)}*a_2^{(1)}+w_{23}^{(2)}*a_3^{(1)}+b_2^{(2)} \\ z_3^{(2)}= w_{31}^{(2)}*a_1^{(1)}+w_{32}^{(2)}*a_2^{(1)}+w_{33}^{(2)}*a_3^{(1)}+b_3^{(2)} \end{array}\right. ⎩ ⎨ ⎧z1(2)=w11(2)∗a1(1)+w12(2)∗a2(1)+w13(2)∗a3(1)+b1(2)z2(2)=w21(2)∗a1(1)+w22(2)∗a2(1)+w23(2)∗a3(1)+b2(2)z3(2)=w31(2)∗a1(1)+w32(2)∗a2(1)+w33(2)∗a3(1)+b3(2)
非线性层
{ a 1 ( 2 ) = f ( z 1 ( 2 ) ) a 2 ( 2 ) = f ( z 2 ( 2 ) ) a 3 ( 2 ) = f ( z 3 ( 2 ) ) \left\{ \begin{array}{c} a_1^{(2)}= f(z_1^{(2)})\\ a_2^{(2)}= f(z_2^{(2)})\\ a_3^{(2)}= f(z_3^{(2)}) \end{array}\right. ⎩ ⎨ ⎧a1(2)=f(z1(2))a2(2)=f(z2(2))a3(2)=f(z3(2))
3、第三层
{ z 1 ( 3 ) = w 11 ( 3 ) ∗ a 1 ( 2 ) + w 12 ( 3 ) ∗ a 2 ( 2 ) + w 13 ( 3 ) ∗ a 3 ( 2 ) + b 1 ( 3 ) z 2 ( 3 ) = w 21 ( 3 ) ∗ a 1 ( 2 ) + w 22 ( 3 ) ∗ a 2 ( 2 ) + w 23 ( 3 ) ∗ a 3 ( 2 ) + b 2 ( 3 ) \left\{ \begin{array}{c} z_1^{(3)}= w_{11}^{(3)}*a_1^{(2)}+w_{12}^{(3)}*a_2^{(2)}+w_{13}^{(3)}*a_3^{(2)}+b_1^{(3)}\\ z_2^{(3)}= w_{21}^{(3)}*a_1^{(2)}+w_{22}^{(3)}*a_2^{(2)}+w_{23}^{(3)}*a_3^{(2)}+b_2^{(3)} \\ \end{array}\right. {z1(3)=w11(3)∗a1(2)+w12(3)∗a2(2)+w13(3)∗a3(2)+b1(3)z2(3)=w21(3)∗a1(2)+w22(3)∗a2(2)+w23(3)∗a3(2)+b2(3)
非线性层
{ a 1 ( 3 ) = f ( z 1 ( 3 ) ) a 2 ( 3 ) = f ( z 2 ( 3 ) ) \left\{ \begin{array}{c} a_1^{(3)}= f(z_1^{(3)})\\ a_2^{(3)}= f(z_2^{(3)})\\ \end{array}\right. {a1(3)=f(z1(3))a2(3)=f(z2(3))
4、输出层
{ y 1 = a 1 ( 3 ) y 2 = a 2 ( 3 ) \left\{ \begin{array}{c} y_1= a_1^{(3)}\\ y_2=a_2^{(3)} \\ \end{array}\right. {y1=a1(3)y2=a2(3)
5、计算损失函数
l
o
s
s
=
1
2
∣
y
r
e
f
−
y
d
e
s
∣
2
loss = \frac{1}{2}|y_{ref} - y_{des}|^2
loss=21∣yref−ydes∣2
展开损失:
l
o
s
s
=
1
2
[
(
y
1
−
y
1
d
e
s
)
2
+
(
y
2
−
y
2
d
e
s
)
2
]
loss = \frac{1}{2}[(y_1 - y_{1_{des}})^2+(y_2 - y_{2_{des}})^2]
loss=21[(y1−y1des)2+(y2−y2des)2]
二、后向传播
1、更新参数
2、定义误差项
3、通过误差项计算两个偏导
(1)计算最后一层δ
单个误差项
结合上面三个公式:
(2)推导前一层的δ
利用前向传播的链式法则:
每项的前三个就是后一层的误差项:
计算上面公式中的三个偏导,得出下式:
考虑同一层的误差项目,写成一般形式: