布局
分子布局
∂
y
∂
x
=
(
∂
y
∂
x
1
∂
y
∂
x
2
⋯
∂
y
∂
x
n
)
\frac{\partial y}{\partial \mathbf{x}} = \begin{pmatrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} &\cdots & \frac{\partial y}{\partial x_n} \end{pmatrix}
∂x∂y=(∂x1∂y∂x2∂y⋯∂xn∂y)
∂
y
∂
x
=
(
∂
y
1
∂
x
∂
y
2
∂
x
⋮
∂
y
n
∂
x
)
\frac{\partial \mathbf{y}}{\partial x} = \begin{pmatrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \vdots \\ \frac{\partial y_n}{\partial x} \end{pmatrix}
∂x∂y=
∂x∂y1∂x∂y2⋮∂x∂yn
∂
y
∂
x
=
[
∂
y
1
∂
x
1
∂
y
1
∂
x
2
⋯
∂
y
1
∂
x
n
∂
y
2
∂
x
1
∂
y
2
∂
x
2
⋯
∂
y
2
∂
x
n
⋮
⋮
⋱
⋮
∂
y
m
∂
x
1
∂
y
m
∂
x
2
⋯
∂
y
m
∂
x
n
]
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{cccc} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \end{array}\right]
∂x∂y=
∂x1∂y1∂x1∂y2⋮∂x1∂ym∂x2∂y1∂x2∂y2⋮∂x2∂ym⋯⋯⋱⋯∂xn∂y1∂xn∂y2⋮∂xn∂ym
∂
y
∂
X
=
[
∂
y
∂
x
11
∂
y
∂
x
21
⋯
∂
y
∂
x
p
1
∂
y
∂
x
12
∂
y
∂
x
22
⋯
∂
y
∂
x
p
2
⋮
⋮
⋱
⋮
∂
y
∂
x
1
q
∂
y
∂
x
2
q
⋯
∂
y
∂
x
p
q
]
\frac{\partial y}{\partial \mathbf{X}}=\left[\begin{array}{cccc} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p 1}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p 2}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{1 q}} & \frac{\partial y}{\partial x_{2 q}} & \cdots & \frac{\partial y}{\partial x_{p q}} \end{array}\right]
∂X∂y=
∂x11∂y∂x12∂y⋮∂x1q∂y∂x21∂y∂x22∂y⋮∂x2q∂y⋯⋯⋱⋯∂xp1∂y∂xp2∂y⋮∂xpq∂y
∂
Y
∂
x
=
[
∂
y
11
∂
x
∂
y
12
∂
x
⋯
∂
y
1
n
∂
x
∂
y
21
∂
x
∂
y
22
∂
x
⋯
∂
y
2
n
∂
x
⋮
⋮
⋱
⋮
∂
y
m
1
∂
x
∂
y
m
2
∂
x
⋯
∂
y
m
n
∂
x
]
\frac{\partial \mathbf{Y}}{\partial x}=\left[\begin{array}{cccc} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1 n}}{\partial x} \\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2 n}}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{m 1}}{\partial x} & \frac{\partial y_{m 2}}{\partial x} & \cdots & \frac{\partial y_{m n}}{\partial x} \end{array}\right]
∂x∂Y=
∂x∂y11∂x∂y21⋮∂x∂ym1∂x∂y12∂x∂y22⋮∂x∂ym2⋯⋯⋱⋯∂x∂y1n∂x∂y2n⋮∂x∂ymn
d
X
=
[
d
x
11
d
x
12
⋯
d
x
1
n
d
x
21
d
x
22
⋯
d
x
2
n
⋮
⋮
⋱
⋮
d
x
m
1
d
x
m
2
⋯
d
x
m
n
]
d \mathbf{X}=\left[\begin{array}{cccc} d x_{11} & d x_{12} & \cdots & d x_{1 n} \\ d x_{21} & d x_{22} & \cdots & d x_{2 n} \\ \vdots & \vdots & \ddots & \vdots \\ d x_{m 1} & d x_{m 2} & \cdots & d x_{m n} \end{array}\right]
dX=
dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋱⋯dx1ndx2n⋮dxmn
分母布局
∂
y
∂
x
=
[
∂
y
∂
x
1
∂
y
∂
x
2
⋮
∂
y
∂
x
n
]
\frac{\partial y}{\partial \mathbf{x}}=\left[\begin{array}{c} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \vdots \\ \frac{\partial y}{\partial x_n} \end{array}\right]
∂x∂y=
∂x1∂y∂x2∂y⋮∂xn∂y
∂
y
∂
x
=
[
∂
y
1
∂
x
∂
y
2
∂
x
⋯
∂
y
m
∂
x
]
\frac{\partial \mathbf{y}}{\partial x}=\left[\begin{array}{llll} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} & \cdots & \frac{\partial y_m}{\partial x} \end{array}\right]
∂x∂y=[∂x∂y1∂x∂y2⋯∂x∂ym]
∂
y
∂
x
=
[
∂
y
1
∂
x
1
∂
y
2
∂
x
1
⋯
∂
y
m
∂
x
1
∂
y
1
∂
x
2
∂
y
2
∂
x
2
⋯
∂
y
m
∂
x
2
⋮
⋮
⋱
⋮
∂
y
1
∂
x
n
∂
y
2
∂
x
n
⋯
∂
y
m
∂
x
n
]
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{cccc} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n} \end{array}\right]
∂x∂y=
∂x1∂y1∂x2∂y1⋮∂xn∂y1∂x1∂y2∂x2∂y2⋮∂xn∂y2⋯⋯⋱⋯∂x1∂ym∂x2∂ym⋮∂xn∂ym
∂
y
∂
X
=
[
∂
y
∂
x
11
∂
y
∂
x
12
⋯
∂
y
∂
x
1
q
∂
y
∂
x
21
∂
y
∂
x
22
⋯
∂
y
∂
x
2
q
⋮
⋮
⋱
⋮
∂
y
∂
x
p
1
∂
y
∂
x
p
2
⋯
∂
y
∂
x
p
q
]
\frac{\partial y}{\partial \mathbf{X}}=\left[\begin{array}{cccc} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1 q}} \\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2 q}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{p 1}} & \frac{\partial y}{\partial x_{p 2}} & \cdots & \frac{\partial y}{\partial x_{p q}} \end{array}\right]
∂X∂y=
∂x11∂y∂x21∂y⋮∂xp1∂y∂x12∂y∂x22∂y⋮∂xp2∂y⋯⋯⋱⋯∂x1q∂y∂x2q∂y⋮∂xpq∂y
向量对向量求导
推导1:
设
v
=
v
(
x
)
,
u
=
u
(
x
)
v = v\left(\mathbf{x}\right),\mathbf{u}=\mathbf{u}\left(\mathbf{x}\right)
v=v(x),u=u(x)
∂
v
u
∂
x
\frac{\partial v \mathbf{u}}{\partial \mathbf{x}}
∂x∂vu
分子布局:
∂
(
v
u
)
i
∂
x
j
=
∂
(
v
u
i
)
∂
x
j
=
∂
v
∂
x
j
u
i
+
v
∂
u
i
∂
x
j
=
u
i
(
∂
v
∂
x
)
j
+
v
(
∂
u
∂
x
)
i
j
\frac{\partial \left(v \mathbf{u}\right)_i}{\partial \mathbf{x}_j}=\frac{\partial \left(v \mathbf{u}_i\right)}{\partial \mathbf{x}_j}=\frac{\partial v}{\partial \mathbf{x}_j}\mathbf{u}_i + v\frac{\partial \mathbf{u}_i}{\partial \mathbf{x}_j}=\mathbf{u}_i\left(\frac{\partial v}{\partial \mathbf{x}}\right)_j +v\left(\frac{\partial \mathbf{u}}{\partial \mathbf{x}}\right)_{ij}
∂xj∂(vu)i=∂xj∂(vui)=∂xj∂vui+v∂xj∂ui=ui(∂x∂v)j+v(∂x∂u)ij
进而
∂
v
u
∂
x
=
u
∂
v
∂
x
+
v
∂
u
∂
x
\frac{\partial v \mathbf{u}}{\partial \mathbf{x}}=\mathbf{u}\frac{\partial v}{\partial \mathbf{x}}+v\frac{\partial \mathbf{u}}{\partial \mathbf{x}}
∂x∂vu=u∂x∂v+v∂x∂u
分母布局:
∂
(
v
u
)
j
∂
x
i
=
∂
(
v
u
j
)
∂
x
i
=
∂
v
∂
x
i
u
j
+
v
∂
u
j
∂
x
i
=
(
∂
v
∂
x
)
i
u
j
+
v
(
∂
u
∂
x
)
i
j
\frac{\partial \left(v \mathbf{u}\right)_j}{\partial \mathbf{x}_i}=\frac{\partial \left(v \mathbf{u}_j\right)}{\partial \mathbf{x}_i}=\frac{\partial v}{\partial \mathbf{x}_i}\mathbf{u}_j + v\frac{\partial \mathbf{u}_j}{\partial \mathbf{x}_i}=\left(\frac{\partial v}{\partial \mathbf{x}}\right)_i \mathbf{u}_j +v\left(\frac{\partial \mathbf{u}}{\partial \mathbf{x}}\right)_{ij}
∂xi∂(vu)j=∂xi∂(vuj)=∂xi∂vuj+v∂xi∂uj=(∂x∂v)iuj+v(∂x∂u)ij
∂
v
u
∂
x
=
∂
v
∂
x
u
T
+
v
∂
u
∂
x
\frac{\partial v \mathbf{u}}{\partial \mathbf{x}}=\frac{\partial v}{\partial \mathbf{x}} \mathbf{u}^T+v\frac{\partial \mathbf{u}}{\partial \mathbf{x}}
∂x∂vu=∂x∂vuT+v∂x∂u
推导2:
设
g
(
u
)
:
R
n
→
R
n
\mathbf{g}\left(\mathbf{u}\right):\mathbb{R}^{n}\to\mathbb{R}^n
g(u):Rn→Rn
则
∂
g
i
∂
x
j
=
∑
k
∂
g
i
∂
u
k
∂
u
k
∂
x
j
\frac{\partial g_i}{\partial x_j}=\sum_{k}\frac{\partial g_i}{\partial u_k} \frac{\partial u_k}{\partial x_j}
∂xj∂gi=∑k∂uk∂gi∂xj∂uk
分子布局:
∂
g
∂
x
=
∂
g
∂
u
∂
u
∂
x
\frac{\partial \mathbf{g}}{\partial \mathbf{x}} = \frac{\partial \mathbf{g}}{\partial \mathbf{u}}\frac{\partial \mathbf{u}}{\partial \mathbf{x}}
∂x∂g=∂u∂g∂x∂u
例子:
l
=
∥
X
w
−
y
∥
2
l=\|\mathbf{X}\mathbf{w}-\mathbf{y}\|^2
l=∥Xw−y∥2,其中
X
∈
R
m
×
n
,
w
,
y
∈
R
n
\mathbf{X}\in\mathbb{R}^{m\times n},\mathbf{w},\mathbf{y}\in\mathbb{R}^n
X∈Rm×n,w,y∈Rn,求
∂
l
∂
w
\frac{\partial l}{\partial \mathbf{w}}
∂w∂l
设
u
=
X
w
−
y
\mathbf{u} = \mathbf{X}\mathbf{w}-\mathbf{y}
u=Xw−y
∂
l
∂
w
=
∂
u
∂
w
∂
l
∂
u
=
X
T
2
u
=
2
X
T
(
X
w
−
y
)
\frac{\partial l}{\partial \mathbf{w}} = \frac{\partial \mathbf{u}}{\partial \mathbf{w}} \frac{\partial l}{\partial \mathbf{u}}=\mathbf{X}^T2\mathbf{u}=2\mathbf{X}^T\left( \mathbf{X}\mathbf{w}-\mathbf{y}\right)
∂w∂l=∂w∂u∂u∂l=XT2u=2XT(Xw−y)
标量对向量求导
微分
分母布局
d
f
=
∑
i
=
1
m
∑
i
=
1
n
∂
f
∂
x
i
j
d
x
i
j
=
t
r
(
∂
f
∂
x
T
d
X
)
\rm{d} f=\sum_{i=1}^{m}\sum_{i=1}^{n}\frac{\partial f}{\partial x_{ij}}\rm{d}x_{ij}=tr\left(\frac{\partial f}{\partial \mathbf{x}}^T\rm{d}\mathbf{X}\right)
df=i=1∑mi=1∑n∂xij∂fdxij=tr(∂x∂fTdX)
法则:
d
(
X
±
Y
)
=
d
X
±
d
Y
\rm{d}\left(\mathbf{X} \pm \mathbf{Y}\right) = \rm{d}\mathbf{X} \pm \rm{d}\mathbf{Y}
d(X±Y)=dX±dY
d
(
X
Y
)
=
d
(
X
)
Y
+
X
d
(
Y
)
\rm{d}\left(\mathbf{X} \mathbf{Y}\right) =\rm{d}\left(\mathbf{X} \right) \mathbf{Y}+ \mathbf{X} \rm{d}\left(\mathbf{Y}\right)
d(XY)=d(X)Y+Xd(Y)
d
(
X
T
)
=
(
d
X
)
T
\rm{d}\left(\mathbf{X}^T\right)=\left(\rm{d} \mathbf{X}\right)^T
d(XT)=(dX)T
d
t
r
(
X
)
=
t
r
(
d
X
)
\rm{d} tr\left(\mathbf{X}\right)=tr\left(\rm{d} \mathbf{X}\right)
dtr(X)=tr(dX)
d
X
−
1
=
−
X
−
1
(
d
X
)
X
−
1
\rm{d} \mathbf{X}^{-1}=-\mathbf{X}^{-1}\left(\rm{d}\mathbf{X}\right) \mathbf{X}^{-1}
dX−1=−X−1(dX)X−1
d
∣
X
∣
=
t
r
(
X
∗
d
X
)
=
∣
X
∣
t
r
(
X
−
1
d
X
)
\rm{d}\left|\mathbf{X}\right|=tr\left(\mathbf{X}^{*}\rm{d}\mathbf{X}\right) = \left|\mathbf{X}\right|tr\left(\mathbf{X}^{-1}\rm{d}\mathbf{X}\right)
d∣X∣=tr(X∗dX)=∣X∣tr(X−1dX)
d
(
X
⊙
Y
)
=
d
X
⊙
Y
+
X
⊙
d
Y
d(\mathbf{X} \odot \mathbf{Y})=d \mathbf{X} \odot \mathbf{Y}+\mathbf{X} \odot d \mathbf{Y}
d(X⊙Y)=dX⊙Y+X⊙dY
d
σ
(
X
)
=
σ
′
(
X
)
⊙
d
X
d \sigma(\mathbf{X})=\sigma^{\prime}(\mathbf{X}) \odot d \mathbf{X}
dσ(X)=σ′(X)⊙dX
技巧:
X
=
t
r
(
X
)
\mathbf{X} = tr\left(\mathbf{X}\right)
X=tr(X)
t
r
(
X
T
)
=
t
r
(
X
)
tr\left(\mathbf{X}^T\right)=tr\left(\mathbf{X}\right)
tr(XT)=tr(X)
t
r
(
X
±
Y
)
=
t
r
(
X
)
±
t
r
(
Y
)
tr\left(\mathbf{X} \pm \mathbf{Y}\right)=tr\left(\mathbf{X}\right) \pm tr\left(\mathbf{Y}\right)
tr(X±Y)=tr(X)±tr(Y)
t
r
(
X
Y
)
=
t
r
(
Y
X
)
tr\left(\mathbf{X}\mathbf{Y}\right) = tr\left(\mathbf{Y}\mathbf{X}\right)
tr(XY)=tr(YX)
t
r
(
A
T
(
B
⊙
C
)
)
=
t
r
(
(
A
⊙
B
)
T
C
)
tr\left(\mathbf{A}^T\left(\mathbf{B}\odot\mathbf{C}\right)\right)=tr\left(\left(\mathbf{A}\odot\mathbf{B}\right)^T\mathbf{C}\right)
tr(AT(B⊙C))=tr((A⊙B)TC)
求导例子:
f
=
t
r
(
Y
T
M
Y
)
,
Y
=
σ
(
W
X
)
f = tr\left(\mathbf{Y}^T\mathbf{M}\mathbf{Y}\right),\mathbf{Y} = \sigma\left(\mathbf{W}\mathbf{X}\right)
f=tr(YTMY),Y=σ(WX)
d
f
=
t
r
(
d
Y
T
M
Y
+
Y
T
M
d
Y
)
⇒
∂
f
∂
Y
=
M
Y
+
M
T
Y
\rm{d}f=tr\left(\rm{d}\mathbf{Y}^T\mathbf{M}\mathbf{Y}+\mathbf{Y}^T\mathbf{M}\rm{d}\mathbf{Y}\right)\Rightarrow\frac{\partial f}{\partial \mathbf{Y}}=\mathbf{M}\mathbf{Y}+\mathbf{M}^T\mathbf{Y}
df=tr(dYTMY+YTMdY)⇒∂Y∂f=MY+MTY
d
Y
=
t
r
(
σ
′
(
W
X
)
⊙
(
W
d
X
)
)
\rm{d}\mathbf{Y} = tr\left(\sigma^{\prime}\left(\mathbf{W}\mathbf{X}\right)\odot\left( \mathbf{W}\rm{d}\mathbf{X}\right)\right)
dY=tr(σ′(WX)⊙(WdX))
d
f
=
t
r
(
∂
f
∂
Y
T
d
Y
)
=
t
r
(
∂
f
∂
Y
T
σ
′
(
W
X
)
⊙
(
W
d
X
)
)
=
t
r
(
(
∂
f
∂
Y
⊙
σ
′
(
W
X
)
)
T
(
W
d
X
)
)
\rm{d}f=tr\left(\frac{\partial f}{\partial\mathbf{Y}}^T\mathbf{d}\mathbf{Y}\right)=tr\left(\frac{\partial f}{\partial\mathbf{Y}}^T\sigma^{\prime}\left(\mathbf{W}\mathbf{X}\right)\odot\left( \mathbf{W}\rm{d}\mathbf{X}\right)\right)=tr\left(\left(\frac{\partial f}{\partial\mathbf{Y}}\odot\sigma^{\prime}\left(\mathbf{W}\mathbf{X}\right)\right)^T\left( \mathbf{W}\rm{d}\mathbf{X}\right)\right)
df=tr(∂Y∂fTdY)=tr(∂Y∂fTσ′(WX)⊙(WdX))=tr((∂Y∂f⊙σ′(WX))T(WdX))
于是
∂
f
∂
X
=
W
T
(
(
M
Y
+
M
T
Y
)
⊙
σ
′
(
W
X
)
)
\frac{\partial f}{\partial \mathbf{X}}=\mathbf{W}^T\left(\left(\mathbf{M}\mathbf{Y}+\mathbf{M}^T\mathbf{Y}\right)\odot\sigma^{\prime}\left(\mathbf{W}\mathbf{X}\right)\right)
∂X∂f=WT((MY+MTY)⊙σ′(WX))
参考:
https://zhuanlan.zhihu.com/p/24709748
https://en.wikipedia.org/wiki/Matrix_calculus#convert_differential_derivative