论文阅读笔记:Denoising Diffusion Probabilistic Models (3)

news2025/3/30 20:16:53

论文阅读笔记:Denoising Diffusion Probabilistic Models (1)
论文阅读笔记:Denoising Diffusion Probabilistic Models (2)
论文阅读笔记:Denoising Diffusion Probabilistic Models (3)

4、损失函数逐项分析

可以看出 L L L总共分为了3项,首先考虑第一项 L 1 L_1 L1
L 1 = E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) ( l o g [ q ( x T ∣ x 0 ) p ( x T ) ] ) = ∫ d x 1 : T ⋅ q ( x 1 : T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] = ∫ d x 1 : T ⋅ q ( x 1 : T ∣ x 0 ) q ( x T ∣ x 0 ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] = ∫ d x 1 : T ⋅ q ( x 1 : T − 1 ∣ x 0 , x T ) ⏟ q ( x 1 : T ∣ x 0 ) = q ( x T ∣ x 0 ) ⋅ q ( x 1 ; T − 1 ∣ x 0 , x T ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] = ∫ ( ∫ q ( x 1 : T − 1 ∣ x 0 , x T ) ⋅ ∏ k = 1 T − 1 d x k ⏟ 二重积分化为两个定积分相乘,并且 = 1 ) ⋅ q ( x T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] ⋅ d x T = ∫ q ( x T ∣ x 0 ) ⋅ l o g [ q ( x T ∣ x 0 ) p ( x T ) ] ⋅ d x T = E x T ∼ q ( x T ∣ x 0 ) l o g [ q ( x T ∣ x 0 ) p ( x T ) ] = K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) \begin{equation} \begin{split} L_1&=E_{x_{1:T} \sim q(x_{1:T} | x_0)} \Bigg(log \Big[ \frac{q(x_{T}|x_0)}{ p(x_T)}\Big]\Bigg) \\ &=\int dx_{1:T} \cdot q(x_{1:T}| x_0) \cdot log \Big[ \frac{q(x_{T}|x_0)}{ p(x_T)}\Big] \\ &=\int dx_{1:T} \cdot \frac{q(x_{1:T}| x_0)}{q(x_T|x_0)} \cdot q(x_T|x_0) \cdot log \Big[ \frac{q(x_{T}|x_0)}{ p(x_T)}\Big] \\ &=\int dx_{1:T} \cdot \underbrace{ q(x_{1:T-1}| x_0, x_T) }_{q(x_{1:T}| x_0)=q(x_{T}|x_0) \cdot q(x_{1;T-1}| x_0, x_T)} \cdot q(x_T|x_0) \cdot log \Big[ \frac{q(x_{T}|x_0)}{ p(x_T)}\Big] \\ &=\int \Bigg( \underbrace{ \int q(x_{1:T-1}| x_0, x_T) \cdot \prod_{k=1}^{T-1} dx_k }_{二重积分化为两个定积分相乘,并且=1} \Bigg) \cdot q(x_T|x_0) \cdot log \Big[ \frac{q(x_{T}|x_0)}{ p(x_T)} \Big] \cdot dx_{T} \\ &=\int q(x_T|x_0) \cdot log \Big[ \frac{q(x_{T}|x_0)}{ p(x_T)} \Big] \cdot dx_{T} \\ &=E_{x^T\sim q(x_T|x_0)} log \Big[ \frac{q(x_{T}|x_0)}{ p(x_T)} \Big]\\ &= KL\Big(q(x_T|x_0)||p(x_T)\Big) \end{split} \end{equation} L1=Ex1:Tq(x1:Tx0)(log[p(xT)q(xTx0)])=dx1:Tq(x1:Tx0)log[p(xT)q(xTx0)]=dx1:Tq(xTx0)q(x1:Tx0)q(xTx0)log[p(xT)q(xTx0)]=dx1:Tq(x1:Tx0)=q(xTx0)q(x1;T1x0,xT) q(x1:T1x0,xT)q(xTx0)log[p(xT)q(xTx0)]=(二重积分化为两个定积分相乘,并且=1 q(x1:T1x0,xT)k=1T1dxk)q(xTx0)log[p(xT)q(xTx0)]dxT=q(xTx0)log[p(xT)q(xTx0)]dxT=ExTq(xTx0)log[p(xT)q(xTx0)]=KL(q(xTx0)∣∣p(xT))

可以看出, L 1 L_1 L1 q ( x T ∣ x 0 ) q(x_T|x_0) q(xTx0) p ( x T ) p(x_T) p(xT)的散度。 q ( x T ∣ x 0 ) q(x_T|x_0) q(xTx0)是前向加噪过程的终点,是无限趋向于标准正态分布。而 p ( x T ) p(x_T) p(xT)是高斯分布,这在论文《Denoising Diffusion Probabilistic Models》中的2 Background的第四行中有说明。由 两个高斯分布KL散度推导可以计算出 L 1 L_1 L1,也就是说 L 1 L_1 L1是一个定值。因此,在损失函数中 L 1 L_1 L1可以被忽略掉。

接着考虑第二项 L 2 L_2 L2

L 2 = E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) ( ∑ t = 2 T l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) ( l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ d x 1 : T ⋅ q ( x 1 : T ∣ x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ d x 1 : T ⋅ q ( x 1 : T ∣ x 0 ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ d x 1 : T ⋅ q ( x 0 : T ) q ( x 0 ) ⏟ q ( x 0 : T ) = q ( x 0 ) ⋅ q ( x 1 : T ∣ x 0 ) ⋅ q ( x t , x 0 ) q ( x t , x t − 1 , x 0 ) ⏟ q ( x t , x t − 1 , x 0 ) = q ( x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ d x 1 : T ⋅ q ( x 0 : T ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t − 1 , x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x 0 : T ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t − 1 , x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x 0 : T ) q ( x t − 1 , x 0 ) ⋅ q ( x t , x 0 ) q ( x 0 ) ⋅ q ( x t ∣ x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⏟ q ( x 0 ; T ) = q ( x t − 1 , x 0 ) ⋅ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ q ( x t ∣ x 0 ) q ( x t ∣ x t − 1 , x 0 ) ⏟ q ( x t , x 0 ) = q ( x 0 ) ⋅ q ( x t ∣ x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ q ( x t ∣ x 0 ) q ( x t ∣ x t − 1 , x 0 ) ⏟ = 1 ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ ∏ k ≥ 1 , k ≠ t − 1 d x k ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ [ ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ ∏ k ≥ 1 , k ≠ t − 1 d x k ⏟ = 1 ] ⋅ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( ∫ q ( x t − 1 ∣ x t , x 0 ) ⋅ l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) d x t − 1 ] ) = ∑ t = 2 T ( E x t − 1 ∼ q ( x t − 1 ∣ x t , x 0 ) l o g [ q ( x t − 1 ∣ x t , x 0 ) p θ ( x t − 1 ∣ x t ) ] ) = ∑ t = 2 T K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) \begin{equation} \begin{split} L_2&=E_{x_{1:T} \sim q(x_{1:T} | x_0)} \Bigg(\sum_{t=2}^{T} log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} \Big]\Bigg)\\ &=\sum_{t=2}^{T} E_{x_{1:T} \sim q(x_{1:T} | x_0)} \Bigg(log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} \Big]\Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx_{1:T} \cdot q(x_{1:T}| x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx_{1:T} \cdot \frac{ q(x_{1:T}| x_0)}{q(x_{t-1}|x_t,x_0)} \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p(x_{t-1}|x_t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx_{1:T} \cdot \underbrace{ \frac{q(x_{0:T})}{q(x_0)}}_{q(x_{0:T})=q(x_0)\cdot q(x_{1:T}| x_0)} \cdot \underbrace{ \frac{q(x_t,x_0)}{q(x_t,x_{t-1},x_0)}}_{q(x_t,x_{t-1},x_0)=q(x_t,x_0)\cdot q(x_{t-1}|x_t,x_0)} \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int dx_{1:T} \cdot \frac{q(x_{0:T})}{q(x_0)}\cdot \frac{q(x_t,x_0)}{q(x_{t-1},x_0)\cdot q(x_t|x_{t-1},x_0)} \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[ \int \frac{q(x_{0:T})}{q(x_0)}\cdot \frac{q(x_t,x_0)}{q(x_{t-1},x_0)\cdot q(x_t|x_{t-1},x_0)} \prod_{k\geq1 ,k\neq t-1} dx_k \bigg] \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} dx_{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[ \int \frac{q(x_{0:T})}{q(x_{t-1},x_0)}\cdot \frac{q(x_t,x_0)}{q(x_0)\cdot q(x_t|x_{t-1},x_0)} \prod_{k\geq1 ,k\neq t-1} dx_k \bigg] \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} dx_{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[ \underbrace{ \int q(x_{k:k\geq1,k\neq t-1}|x_{t-1},x_0)}_{q(x_{0;T})=q(x_{t-1},x_0)\cdot q(x_{k:k\geq1,k\neq t-1}|x_{t-1},x_0)} \cdot \underbrace {\frac{q(x_t|x_0)}{ q(x_t|x_{t-1},x_0)}}_{q(x_t,x_0)=q(x_0)\cdot q(x_t|x_0)} \prod_{k\geq1 ,k\neq t-1} dx_k \bigg] \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} dx_{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[\int q(x_{k:k\geq1,k\neq t-1}|x_{t-1},x_0)\cdot \underbrace {\frac{q(x_t|x_0)}{ q(x_t|x_{t-1},x_0)}}_{=1} \prod_{k\geq1 ,k\neq t-1} dx_k \bigg] \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} dx_{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[\int q(x_{k:k\geq1,k\neq t-1}|x_{t-1},x_0)\cdot \prod_{k\geq1 ,k\neq t-1} dx^k \bigg] \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} dx_{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int \bigg[\underbrace{ \int q(x_{k:k\geq1,k\neq t-1}|x_{t-1},x^0)\cdot \prod_{k\geq1 ,k\neq t-1} dx_k }_{=1}\bigg] \cdot q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} dx_{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( \int q(x_{t-1}|x_t,x_0) \cdot log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} dx_{t-1} \Big] \Bigg)\\ &=\sum_{t=2}^{T} \Bigg( E_{x_{t-1}\sim q(x_{t-1}|x_t,x_0)} log \Big[\frac{q(x_{t-1}|x_t,x_0)}{ p_{\theta}(x_{t-1}|x_t)} \Big] \Bigg)\\ &=\sum_{t=2}^{T}KL\bigg(q(x_{t-1}|x_t,x_0)||p_{\theta}(x_{t-1}|x_t) \bigg) \end{split} \end{equation} L2=Ex1:Tq(x1:Tx0)(t=2Tlog[pθ(xt1xt)q(xt1xt,x0)])=t=2TEx1:Tq(x1:Tx0)(log[pθ(xt1xt)q(xt1xt,x0)])=t=2T(dx1:Tq(x1:Tx0)log[pθ(xt1xt)q(xt1xt,x0)])=t=2T(dx1:Tq(xt1xt,x0)q(x1:Tx0)q(xt1xt,x0)log[p(xt1xt)q(xt1xt,x0)])=t=2T(dx1:Tq(x0:T)=q(x0)q(x1:Tx0) q(x0)q(x0:T)q(xt,xt1,x0)=q(xt,x0)q(xt1xt,x0) q(xt,xt1,x0)q(xt,x0)q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)])=t=2T(dx1:Tq(x0)q(x0:T)q(xt1,x0)q(xtxt1,x0)q(xt,x0)q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)])=t=2T([q(x0)q(x0:T)q(xt1,x0)q(xtxt1,x0)q(xt,x0)k1,k=t1dxk]q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)dxt1])=t=2T([q(xt1,x0)q(x0:T)q(x0)q(xtxt1,x0)q(xt,x0)k1,k=t1dxk]q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)dxt1])=t=2T([q(x0;T)=q(xt1,x0)q(xk:k1,k=t1xt1,x0) q(xk:k1,k=t1xt1,x0)q(xt,x0)=q(x0)q(xtx0) q(xtxt1,x0)q(xtx0)k1,k=t1dxk]q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)dxt1])=t=2T([q(xk:k1,k=t1xt1,x0)=1 q(xtxt1,x0)q(xtx0)k1,k=t1dxk]q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)dxt1])=t=2T([q(xk:k1,k=t1xt1,x0)k1,k=t1dxk]q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)dxt1])=t=2T([=1 q(xk:k1,k=t1xt1,x0)k1,k=t1dxk]q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)dxt1])=t=2T(q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)dxt1])=t=2T(Ext1q(xt1xt,x0)log[pθ(xt1xt)q(xt1xt,x0)])=t=2TKL(q(xt1xt,x0)∣∣pθ(xt1xt))
最后考虑 L 3 L_3 L3,事实上,在论文《Deep Unsupervised Learning using Nonequilibrium Thermodynamics》中提到为了防止边界效应,强制另 p ( x 0 ∣ x 1 ) = q ( x 1 ∣ x 0 ) p(x^0|x^1)=q(x^1|x^0) p(x0x1)=q(x1x0),因此这一项也是个常数。

由以上分析可知道,损失函数可以写为公式(3)。
L : = L 1 + L 2 + L 3 = K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) + ∑ t = 2 T K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) − l o g [ p θ ( x 0 ∣ x 1 ) ] \begin{equation} \begin{split} L&:=L_1+L_2+L_3 \\ &=KL\Big(q(x_T|x_0)||p(x_T)\Big) + \sum_{t=2}^{T}KL\bigg(q(x_{t-1}|x_t,x_0)||p_{\theta}(x_{t-1}|x_t) \bigg)-log \Big[p_{\theta}(x_{0}|x_1)\Big] \end{split} \end{equation} L:=L1+L2+L3=KL(q(xTx0)∣∣p(xT))+t=2TKL(q(xt1xt,x0)∣∣pθ(xt1xt))log[pθ(x0x1)]

忽略掉 L 1 L_1 L1 L 3 L_3 L3,损失函数可以写为公式(4)。
L : = ∑ t = 2 T K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) \begin{equation} \begin{split} L:=\sum_{t=2}^{T}KL\bigg(q(x_{t-1}|x_t,x_0)||p_{\theta}(x_{t-1}|x_t) \bigg) \end{split} \end{equation} L:=t=2TKL(q(xt1xt,x0)∣∣pθ(xt1xt))

可以看出 损失函数 L L L是两个高斯分布 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt1xt,x0) p θ ( x t − 1 ∣ x t ) p_{\theta}(x_{t-1}|x_t) pθ(xt1xt)的KL散度。 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt1xt,x0)的均值和方差由论文阅读笔记:Denoising Diffusion Probabilistic Models (1)可知,分别为

σ 1 = β t ⋅ ( 1 − α t − 1 ˉ ) ( 1 − α t ˉ ) μ 1 = 1 α t ⋅ ( x t − β t 1 − α t ˉ ⋅ z t ) 或者 μ 1 = α t ⋅ ( 1 − α t − 1 ˉ ) 1 − α t ˉ ⋅ x t + β t ⋅ α t − 1 ˉ 1 − α t ˉ ⋅ x 0 \begin{equation} \begin{split} \sigma_1&=\sqrt{\frac{\beta_t\cdot (1-\bar{\alpha_{t-1}})}{(1-\bar{\alpha_{t}})}}\\ \mu_1&=\frac{1}{\sqrt{\alpha_t}}\cdot (x_t-\frac{\beta_t}{\sqrt{1-\bar{\alpha_t}}}\cdot z_t) \\ 或者 \mu_1&=\frac{\sqrt{\alpha_t}\cdot(1-\bar{\alpha_{t-1}})}{1-\bar{\alpha_t}}\cdot x_t+\frac{\beta_t\cdot \sqrt{\bar{\alpha_{t-1}}}}{1-\bar{\alpha_t}} \cdot x_0 \end{split} \end{equation} σ1μ1或者μ1=(1αtˉ)βt(1αt1ˉ) =αt 1(xt1αtˉ βtzt)=1αtˉαt (1αt1ˉ)xt+1αtˉβtαt1ˉ x0

p θ ( x t − 1 ∣ x t ) p_{\theta}(x_{t-1}|x_t) pθ(xt1xt)则由模型(深度学习模型或者其他模型)估算出其均值和方差,分别记作 μ 2 , σ 2 \mu_2,\sigma_2 μ2,σ2
因此损失函数 L L L可以进一步写为公式(6)。
L : = l o g [ σ 2 σ 1 ] + σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 − 1 2 \begin{equation} \begin{split} L:=log \Big[\frac{\sigma_2}{\sigma_1}\Big]+\frac{\sigma_1^2 +(\mu_1-\mu_2)^2}{2\sigma_2^2}-\frac{1}{2} \end{split} \end{equation} L:=log[σ1σ2]+2σ22σ12+(μ1μ2)221
论文《Denoising Diffusion Probabilistic Models》中是直接给方差 σ 2 \sigma_2 σ2(论文中为 σ t \sigma_t σt)设置了固定的参数,分别为 β t \beta_t βt或者 1 − α ˉ t − 1 1 − α ˉ t ⋅ β t \frac{1-\bar{\alpha}_{t-1}}{1-\bar\alpha_t}\cdot\beta_t 1αˉt1αˉt1βt。后文中的公式(8)(9)(10)都是这样的。并且论文指出:从实验来看,这两者之间的结果是相似。然而,与论文提到的,代码中使用了两种损失函数,第一种损失函数就是公式(6),如下方代码中的函数normal_kl中所示。第二种损失函数是忽略掉方差 。直接对均值计算平方差,如下方代码代码中函数train_losses中的mse部分所示。

5、代码解析

最后结合原文中的代码diffusion-https://github.com/hojonathanho/diffusion来理解一下训练过程和推理过程。
首先是训练过程

class GaussianDiffusion2:
	  """
	  Contains utilities for the diffusion model.
	  Arguments:
	  - what the network predicts (x_{t-1}, x_0, or epsilon)
	  - which loss function (kl or unweighted MSE)
	  - what is the variance of p(x_{t-1}|x_t) (learned, fixed to beta, or fixed to weighted beta)
	  - what type of decoder, and how to weight its loss? is its variance learned too?
	  """
	
	# 模型中的一些定义
	def __init__(self, *, betas, model_mean_type, model_var_type, loss_type):
	    self.model_mean_type = model_mean_type  # xprev, xstart, eps
	    self.model_var_type = model_var_type  # learned, fixedsmall, fixedlarge
	    self.loss_type = loss_type  # kl, mse
	
	    assert isinstance(betas, np.ndarray)
	    self.betas = betas = betas.astype(np.float64)  # computations here in float64 for accuracy
	    assert (betas > 0).all() and (betas <= 1).all()
	    timesteps, = betas.shape
	    self.num_timesteps = int(timesteps)
	
	    alphas = 1. - betas
	    self.alphas_cumprod = np.cumprod(alphas, axis=0)
	    self.alphas_cumprod_prev = np.append(1., self.alphas_cumprod[:-1])
	    assert self.alphas_cumprod_prev.shape == (timesteps,)
	
	    # calculations for diffusion q(x_t | x_{t-1}) and others
	    self.sqrt_alphas_cumprod = np.sqrt(self.alphas_cumprod)
	    self.sqrt_one_minus_alphas_cumprod = np.sqrt(1. - self.alphas_cumprod)
	    self.log_one_minus_alphas_cumprod = np.log(1. - self.alphas_cumprod)
	    self.sqrt_recip_alphas_cumprod = np.sqrt(1. / self.alphas_cumprod)
	    self.sqrt_recipm1_alphas_cumprod = np.sqrt(1. / self.alphas_cumprod - 1)
	
	    # calculations for posterior q(x_{t-1} | x_t, x_0)
	    self.posterior_variance = betas * (1. - self.alphas_cumprod_prev) / (1. - self.alphas_cumprod)
	    # below: log calculation clipped because the posterior variance is 0 at the beginning of the diffusion chain
	    self.posterior_log_variance_clipped = np.log(np.append(self.posterior_variance[1], self.posterior_variance[1:]))
	    self.posterior_mean_coef1 = betas * np.sqrt(self.alphas_cumprod_prev) / (1. - self.alphas_cumprod)
	    self.posterior_mean_coef2 = (1. - self.alphas_cumprod_prev) * np.sqrt(alphas) / (1. - self.alphas_cumprod)
	
	# 在模型Model类当中的方法
	def train_fn(self, x, y):
	    B, H, W, C = x.shape
	    if self.randflip:
	      x = tf.image.random_flip_left_right(x)
	      assert x.shape == [B, H, W, C]
	    # 随机生成第t步
	    t = tf.random_uniform([B], 0, self.diffusion.num_timesteps, dtype=tf.int32)
	    # 计算第t步时对应的损失函数
	    losses = self.diffusion.training_losses(
	      denoise_fn=functools.partial(self._denoise, y=y, dropout=self.dropout), x_start=x, t=t)
	    assert losses.shape == t.shape == [B]
	    return {'loss': tf.reduce_mean(losses)}
	
	# 根据x_start采样到第t步的带噪图像
	def q_sample(self, x_start, t, noise=None):
	    """
	    Diffuse the data (t == 0 means diffused for 1 step)
	    """
	    if noise is None:
	      noise = tf.random_normal(shape=x_start.shape)
	    assert noise.shape == x_start.shape
	    return (
	        self._extract(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start +
	        self._extract(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape) * noise
	    )
	
	# 计算q(x^{t-1}|x^t,x^0)分布的均值和方差
	def q_posterior_mean_variance(self, x_start, x_t, t):
	    """
	    Compute the mean and variance of the diffusion posterior q(x_{t-1} | x_t, x_0)
	    """
	    assert x_start.shape == x_t.shape
	    posterior_mean = (
	        self._extract(self.posterior_mean_coef1, t, x_t.shape) * x_start +
	        self._extract(self.posterior_mean_coef2, t, x_t.shape) * x_t
	    )
	    posterior_variance = self._extract(self.posterior_variance, t, x_t.shape)
	    posterior_log_variance_clipped = self._extract(self.posterior_log_variance_clipped, t, x_t.shape)
	    assert (posterior_mean.shape[0] == posterior_variance.shape[0] == posterior_log_variance_clipped.shape[0] ==
	            x_start.shape[0])
	    return posterior_mean, posterior_variance, posterior_log_variance_clipped
    
    # 由深度学习模型UNet估算出p(x^{t-1}|x^t)分布的方差和均值
	def p_mean_variance(self, denoise_fn, *, x, t, clip_denoised: bool, return_pred_xstart: bool):
	    B, H, W, C = x.shape
	    assert t.shape == [B]
	    model_output = denoise_fn(x, t)
	
	    # Learned or fixed variance?
	    if self.model_var_type == 'learned':
	      assert model_output.shape == [B, H, W, C * 2]
	      model_output, model_log_variance = tf.split(model_output, 2, axis=-1)
	      model_variance = tf.exp(model_log_variance)
	    elif self.model_var_type in ['fixedsmall', 'fixedlarge']:
	      # below: only log_variance is used in the KL computations
	      model_variance, model_log_variance = {
	        # for fixedlarge, we set the initial (log-)variance like so to get a better decoder log likelihood
	        'fixedlarge': (self.betas, np.log(np.append(self.posterior_variance[1], self.betas[1:]))),
	        'fixedsmall': (self.posterior_variance, self.posterior_log_variance_clipped),
	      }[self.model_var_type]
	      model_variance = self._extract(model_variance, t, x.shape) * tf.ones(x.shape.as_list())
	      model_log_variance = self._extract(model_log_variance, t, x.shape) * tf.ones(x.shape.as_list())
	    else:
	      raise NotImplementedError(self.model_var_type)
	
	    # Mean parameterization
	    _maybe_clip = lambda x_: (tf.clip_by_value(x_, -1., 1.) if clip_denoised else x_)
	    if self.model_mean_type == 'xprev':  # the model predicts x_{t-1}
	      pred_xstart = _maybe_clip(self._predict_xstart_from_xprev(x_t=x, t=t, xprev=model_output))
	      model_mean = model_output
	    elif self.model_mean_type == 'xstart':  # the model predicts x_0
	      pred_xstart = _maybe_clip(model_output)
	      model_mean, _, _ = self.q_posterior_mean_variance(x_start=pred_xstart, x_t=x, t=t)
	    elif self.model_mean_type == 'eps':  # the model predicts epsilon
	      pred_xstart = _maybe_clip(self._predict_xstart_from_eps(x_t=x, t=t, eps=model_output))
	      model_mean, _, _ = self.q_posterior_mean_variance(x_start=pred_xstart, x_t=x, t=t)
	    else:
	      raise NotImplementedError(self.model_mean_type)
	
	    assert model_mean.shape == model_log_variance.shape == pred_xstart.shape == x.shape
	    if return_pred_xstart:
	      return model_mean, model_variance, model_log_variance, pred_xstart
	    else:
	      return model_mean, model_variance, model_log_variance


	# 损失函数的计算过程
	def training_losses(self, denoise_fn, x_start, t, noise=None):
	    assert t.shape == [x_start.shape[0]]
	    
	    # 随机生成一个噪音
	    if noise is None:
	      noise = tf.random_normal(shape=x_start.shape, dtype=x_start.dtype)
	    assert noise.shape == x_start.shape and noise.dtype == x_start.dtype
	    
	    # 将随机生成的噪音加到x_start上得到第t步的带噪图像
	    x_t = self.q_sample(x_start=x_start, t=t, noise=noise)
		
		# 有两种损失函数的方法,'kl'和'mse',并且这两种方法差别并不明显。
	    if self.loss_type == 'kl':  # the variational bound
	      losses = self._vb_terms_bpd(
	        denoise_fn=denoise_fn, x_start=x_start, x_t=x_t, t=t, clip_denoised=False, return_pred_xstart=False)
	    
	    elif self.loss_type == 'mse':  # unweighted MSE
	      assert self.model_var_type != 'learned'
	      target = {
	        'xprev': self.q_posterior_mean_variance(x_start=x_start, x_t=x_t, t=t)[0],
	        'xstart': x_start,
	        'eps': noise
	      }[self.model_mean_type]
	      model_output = denoise_fn(x_t, t)
	      assert model_output.shape == target.shape == x_start.shape
	      losses = nn.meanflat(tf.squared_difference(target, model_output))
	    else:
	      raise NotImplementedError(self.loss_type)
	
	    assert losses.shape == t.shape
	    return losses
	    
	# 计算两个高斯分布的KL散度,代码中的logvar1,logvar2为方差的对数. 上文中的公式(6)
	def normal_kl(mean1, logvar1, mean2, logvar2):
	  return 0.5 * (-1.0 + logvar2 - logvar1 + tf.exp(logvar1 - logvar2)
	                + tf.squared_difference(mean1, mean2) * tf.exp(-logvar2))
	
	# 使用'kl'方法计算损失函数
	def _vb_terms_bpd(self, denoise_fn, x_start, x_t, t, *, clip_denoised: bool, return_pred_xstart: bool):
	    true_mean, _, true_log_variance_clipped = self.q_posterior_mean_variance(x_start=x_start, x_t=x_t, t=t)
	    model_mean, _, model_log_variance, pred_xstart = self.p_mean_variance(
	      denoise_fn, x=x_t, t=t, clip_denoised=clip_denoised, return_pred_xstart=True)
	    kl = normal_kl(true_mean, true_log_variance_clipped, model_mean, model_log_variance)
	    kl = nn.meanflat(kl) / np.log(2.)
	
	    decoder_nll = -utils.discretized_gaussian_log_likelihood(
	      x_start, means=model_mean, log_scales=0.5 * model_log_variance)
	    assert decoder_nll.shape == x_start.shape
	    decoder_nll = nn.meanflat(decoder_nll) / np.log(2.)
	
	    # At the first timestep return the decoder NLL, otherwise return KL(q(x_{t-1}|x_t,x_0) || p(x_{t-1}|x_t))
	    assert kl.shape == decoder_nll.shape == t.shape == [x_start.shape[0]]
	    output = tf.where(tf.equal(t, 0), decoder_nll, kl)
	    return (output, pred_xstart) if return_pred_xstart else output

接下来是推理过程。

def p_sample(self, denoise_fn, *, x, t, noise_fn, clip_denoised=True, return_pred_xstart: bool):
    """
    Sample from the model
    """
    # 使用深度学习模型,根据x^t和t估算出x^{t-1}的均值和分布
    model_mean, _, model_log_variance, pred_xstart = self.p_mean_variance(
      denoise_fn, x=x, t=t, clip_denoised=clip_denoised, return_pred_xstart=True)
    noise = noise_fn(shape=x.shape, dtype=x.dtype)
    assert noise.shape == x.shape
    # no noise when t == 0
    nonzero_mask = tf.reshape(1 - tf.cast(tf.equal(t, 0), tf.float32), [x.shape[0]] + [1] * (len(x.shape) - 1))
    
    # 当t>0时,模型估算出的结果还要加上一个高斯噪音,因为要继续循环。当t=0时,循环停止,因此不需要再添加噪音了,输出最后的结果。
    sample = model_mean + nonzero_mask * tf.exp(0.5 * model_log_variance) * noise
    assert sample.shape == pred_xstart.shape
    return (sample, pred_xstart) if return_pred_xstart else sample

def p_sample_loop(self, denoise_fn, *, shape, noise_fn=tf.random_normal):
    """
    Generate samples
    """
    assert isinstance(shape, (tuple, list))
	# 生成总的布数T
    i_0 = tf.constant(self.num_timesteps - 1, dtype=tf.int32)
    # 随机生成一个噪音作为p(x^T)
    img_0 = noise_fn(shape=shape, dtype=tf.float32)
    # 循环T次,得到最终的图像
    _, img_final = tf.while_loop(
      cond=lambda i_, _: tf.greater_equal(i_, 0),
      body=lambda i_, img_: [
        i_ - 1,
        self.p_sample(
          denoise_fn=denoise_fn, x=img_, t=tf.fill([shape[0]], i_), noise_fn=noise_fn, return_pred_xstart=False)
      ],
      loop_vars=[i_0, img_0],
      shape_invariants=[i_0.shape, img_0.shape],
      back_prop=False
    )
    assert img_final.shape == shape
    return img_final

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2322775.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【设计模式】抽象工厂模式(含与工厂方法模式的对比)

本期我们来学习一下设计模式之抽象工厂模式&#xff0c;在软件开发中&#xff0c;工厂模式 和 抽象工厂模式 都用于创建对象&#xff0c;但它们的应用场景和实现方式有所不同。本文将基于 C 代码&#xff0c;分析抽象工厂模式的实现&#xff0c;并对比其与工厂方法模式的区别。…

IDEA转战Trae AI IED配置

Trae Ai 的前身是vscode IDEA转战Trae AI IED配置 1.安装java相关的插件 2、安装spring相关的插件 3.配置maven环境 打开 Trae AI IDE -> 首选项 -> 设置 -> Editor 设置 ⚠️配置方式有两种 setting.json文件中直接编辑&#xff08;推荐&#xff09;界面设置 方案…

再学:区块链基础与合约初探 EVM与GAS机制

目录 1.区块链是什么 2.remix ​3.账户​ ​4.以太坊三种交易​ 5.EVM 6.以太坊客户端节点 ​7.Gas费用 8.区块链浏览器 1.区块链是什么 只需要检验根节点 Merkel根是否有更改&#xff0c;就不用检查每个交易是否有更改。方便很多。 2.remix 3.账户 如果交易失败的话&…

Nextjs15 - middleware的使用

nextjs 官方文档&#xff08;current branch 对应如下文档&#xff09; Middlewarepath-to-regexp 本专栏内容均可在Github&#xff1a;test_05/Middleware 找到 一、middleware 基本使用 中间件允许您在请求完成之前运行代码。然后&#xff0c;根据传入的请求&#xff0c;您…

边缘计算 vs. 云计算,谁才是工业物联网的未来?

前言 在物联网&#xff08;IoT&#xff09;飞速发展的今天&#xff0c;边缘计算正在彻底改变数据的处理、存储和分析方式。传统的IoT设备数据通常需要发送到云端进行处理&#xff0c;但随着设备数量的激增&#xff0c;这种模式在延迟、带宽和安全性方面暴露出诸多局限。边缘计…

leetcode.189.轮转数组

第一次全反转&#xff0c;第二次反转前k个&#xff0c;第三次反转后n-k个 需要注意的是向又轮转k个时&#xff0c;如果超出数组长度&#xff0c;要对其进行取模运算才是正确的向右轮转个数 class Solution { private:void rotate(vector<int>& nums,int start,int …

OCR 识别案例

OCR 识别案例 注意点&#xff1a;输入图像尺寸比例尽量和参与模型训练的数据集比例相似&#xff0c;识别效果会更好。 1、pytesseract Pytesseract是一个Python的光学字符识别&#xff08;OCR&#xff09;工具&#xff0c;它作为Tesseract OCR引擎的封装&#xff0c;允许你在…

Mybatis配置文件解析(详细)

引言 在了解Mybatis如何帮助客户进行数据的存取后&#xff0c;便对Mybatis的配置文件起了兴趣&#xff0c;在查阅官方文档后&#xff0c;总结了平时能用到的配置&#xff0c;希望能对大家有帮助 1.核心配置文件 主要是指Mybatis-config.xml中 其包含了会深深影响Mybatis行为…

【BFS】《单源、多源 BFS:图搜索算法的双生力量》

文章目录 前言单源BFS例题一、迷宫中离入口最近的出口二、 最小基因变化三、单词接龙四、为高尔夫比赛砍树 多源BFS例题一、 01 矩阵二、飞地的数量三、地图中的最高点四、地图分析 结语 前言 什么是单源、多源BFS算法问题呢&#xff1f; BFS&#xff08;Breadth - First Sear…

【2025】基于springboot+vue的医院在线问诊系统设计与实现(源码、万字文档、图文修改、调试答疑)

基于Spring Boot Vue的医院在线问诊系统设计与实现功能结构图如下&#xff1a; 课题背景 随着互联网技术的飞速发展和人们生活水平的不断提高&#xff0c;传统医疗模式面临着诸多挑战&#xff0c;如患者就医排队时间长、医疗资源分配不均、医生工作压力大等。同时&#xff0c;…

STM32基础教程——PWM驱动舵机

目录 前言 技术实现 原理图 接线图 代码实现 内容要点 PWM基本结构 开启外设时钟 配置GPIO端口 配置时基单元 初始化输出比较单元 调整PWM占空比 输出比较通道重映射 舵机角度设置 实验结果 问题记录 前言 舵机&#xff08;Servo&#xff09;是一种位置&#xff…

odata 搜索帮助

参考如下链接&#xff1a; FIORI ELement list report 细节开发&#xff0c;设置过滤器&#xff0c;搜索帮助object page跳转等_fiori element label 变量-CSDN博客 注&#xff1a;odata搜索帮助可以直接将值带出来&#xff0c;而不需要进行任何的重定义 搜索帮助metedata配置…

Docker基本命令VS Code远程连接

Docker基本命令 创建自己的docker容器&#xff1a;docker run --net host --name Container_name --gpus all --shm-size 1t -it -v Your_Path:Your_Dir mllm:mac /bin/bashdocker run&#xff1a;用于创建并启动一个新容器-name&#xff1a;为当前新建的容器命名-gpus&#x…

大疆上云api直播功能如何实现

概述 流媒体服务器作为直播画面的中转站,它接收推流端的相机画面,同时拉流端找它获取相机的画面。整个流程如下: 在流媒体服务器上创建流媒体应用(app),一个流媒体服务器上面可以创建多个流媒体应用约定推拉流的地址。假设流媒体服务器工作在1935端口上面,假设创建的流…

理解文字识别:一文读懂OCR商业化产品的算法逻辑

文字识别是一项“历久弥新”的技术。早在上世纪初&#xff0c;工程师们就开始尝试使用当时有限的硬件设备扫描并识别微缩胶片、纸张上的字符。随着时代和技术的发展&#xff0c;人们在日常生活中使用的电子设备不断更新换代&#xff0c;文字识别的需求成为一项必备的技术基础&a…

使用 Cursor、MCP 和 Figma 实现工程化项目自动化,提升高达 200% 效率

直接上手不多说其他的&#xff01; 一、准备动作 1、Cursor下载安卓 1.1访问官方网站 打开您的网络浏览器&#xff0c;访问 Cursor 的官方网站&#xff1a;https://www.cursor.com/cn 1.2开始下载: 点击"Download for free" 根据您的浏览器设置&#xff0c;会自…

Arduino、ESP32驱动GUVA-S12SD UV紫外线传感器(光照传感器篇)

目录 1、传感器特性 2、控制器和传感器连线图 3、驱动程序 UV紫外线传感器是一个测试紫外线总量的最佳传感器,它不需要使用波长滤波器,只对紫外线敏感。 Arduino UV紫外线传感器,直接输出对应紫外线指数(UV INDEX)的线性电压,输出电压范围大约0~1100mV(对应UV INDEX值…

PTA 1097-矩阵行平移

给定一个&#x1d45b;&#x1d45b;nn的整数矩阵。对任一给定的正整数&#x1d458;<&#x1d45b;k<n&#xff0c;我们将矩阵的奇数行的元素整体向右依次平移1、……、&#x1d458;、1、……、&#x1d458;、……1、……、k、1、……、k、……个位置&#xff0c;平移…

Notepad++ 替换 换行符 为 逗号

多行转一行&#xff0c;逗号分隔 SPO2025032575773 SPO2025032575772 SPO2025032575771 SPO2025032575771 SPO2025032575770为了方便快速替换&#xff0c;我们需要先知道这样类型的数据都存在哪些换行符。 点击【视图】-【显示符号】-【显示行尾符】 对于显示的行尾换行符【C…

使用飞书API自动化更新共享表格数据

飞书API开发之自动更新共享表格 天马行空需求需求拆解1、网站数据爬取2、飞书API调用2.1 开发流程2.2 创建应用2.3 配置应用2.4 发布应用2.5 修改表格权限2.6 获取tenant_access_token2.7 调用API插入数据 总结 天马行空 之前一直都是更新的爬虫逆向内容&#xff0c;工作中基本…