文章目录
- 1. LayerNorm
- 2. 图解
- 3. softmax
- 4. python 代码
1. LayerNorm
y = x − E [ x ] v a r ( x ) + ϵ ∗ γ + β \begin{equation} y=\frac{x-\mathrm{E}[x]}{\sqrt{\mathrm{var}(x)+\epsilon}}*\gamma+\beta \end{equation} y=var(x)+ϵx−E[x]∗γ+β
2. 图解
- 矩阵A 表示如下:
A = [ 0 1 2 3 4 5 6 7 8 9 10 11 ] \begin{equation} A=\begin{bmatrix} 0&1&2&3\\\\ 4&5&6&7\\\\ 8&9&10&11 \end{bmatrix} \end{equation} A= 04815926103711 - 在pytorch中每一行代表一个样本,这里有3行,这有3个样本,每个样本有4个特征维度。
- LayerNorm就在样本维度即,[0,1,2,3]上求解均值E(x),方差var(x)后,按照公式求得如下
E ( x ) = ( 0 + 1 + 2 + 3 ) / 4 = 1.5 \begin{equation} E(x)=(0+1+2+3)/4=1.5 \end{equation} E(x)=(0+1+2+3)/4=1.5
v a r ( x ) = [ ( 0 − 1.5 ) 2 + ( 1 − 1.5 ) 2 + ( 2 − 1.5 ) 2 + ( 3 − 1.5 ) 2 ] / 4 = 1.118 \begin{equation} \sqrt{\mathrm{var}(x)}=\sqrt{[(0-1.5)^2+(1-1.5)^2+(2-1.5)^2+(3-1.5)^2]/4}=1.118 \end{equation} var(x)=[(0−1.5)2+(1−1.5)2+(2−1.5)2+(3−1.5)2]/4=1.118
y = [ 0 − 1.5 1.118 , 1 − 1.5 1.118 , 2 − 1.5 1.118 , 3 − 1.5 1.118 ] = [ − 1.342 , − 0.447 , 0.447 , 1.342 ] \begin{equation} y=[\frac{0-1.5}{1.118},\frac{1-1.5}{1.118},\frac{2-1.5}{1.118},\frac{3-1.5}{1.118}]=[-1.342,-0.447,0.447,1.342] \end{equation} y=[1.1180−1.5,1.1181−1.5,1.1182−1.5,1.1183−1.5]=[−1.342,−0.447,0.447,1.342]
3. softmax
F.softmax的作用是使得向量中的元素归一化到0-1之间
4. python 代码
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.set_printoptions(precision=3, sci_mode=False)
if __name__ == "__main__":
run_code = 0
row = 3
column = 4
total = row * column
matrix_a = torch.arange(total).reshape((row, column)).to(torch.float)
print(f"matrix_a=\n{matrix_a}")
soft_a = F.softmax(matrix_a, dim=-1)
print(f"soft_a=\n{soft_a}")
a = torch.tensor([0, 1, 2, 3]).to(torch.float)
a_softmax = torch.exp(a) / torch.sum(torch.exp(a))
print(f"a_softmax={a_softmax}")
b = torch.tensor([4, 5, 6, 7]).to(torch.float)
b_softmax = torch.exp(b) / torch.sum(torch.exp(b))
print(f"b_softmax={b_softmax}")
c = torch.tensor([4, 5, 6, 7]).to(torch.float)
c_softmax = torch.exp(c) / torch.sum(torch.exp(c))
print(f"c_softmax={c_softmax}")
soft_dim1 = F.softmax(matrix_a, dim=-2)
print(f"soft_dim1=\n{soft_dim1}")
d = torch.tensor([0, 4, 8]).to(torch.float)
d_softmax = torch.exp(d) / torch.sum(torch.exp(d))
print(f"d_softmax={d_softmax}")
d2 = torch.tensor([1, 5, 9]).to(torch.float)
d2_softmax = torch.exp(d2) / torch.sum(torch.exp(d2))
print(f"d2_softmax={d2_softmax}")
layer_norma = nn.LayerNorm(4)
layer_norma_eps = layer_norma.eps
layer_norma_weight = layer_norma.weight
print(f"layer_norma_eps=\n{layer_norma_eps}")
print(f"layer_norma_weight=\n{layer_norma_weight}")
layer_matrix = layer_norma(matrix_a)
print(f"layer_matrix=\n{layer_matrix}")
mean_a = torch.mean(matrix_a, dim=-1, keepdim=True)
print(f"mean_a=\n{mean_a}")
var_a = torch.sqrt(torch.var(matrix_a, dim=-1, keepdim=True, unbiased=False))
print(f"std_a=\n{var_a}")
my_layer = (matrix_a - mean_a) / var_a
print(f"my_layer=\n{my_layer}")
a1 = matrix_a[0, :]
print(f"a1=\n{a1}")
a1_mean = torch.sum(a1) / 4
print(f"a1_mean={a1_mean}")
a1_var =torch.sqrt(torch.sum((a1 - a1_mean) ** 2)/4)
print(f"a1_var={a1_var}")
var1 = math.sqrt(((0-1.5)**2+(1-1.5)**2+(2-1.5)**2+(3-1.5)**2)/4)
print(f"var1={var1}")
- 结果:
matrix_a=
tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
soft_a=
tensor([[0.032, 0.087, 0.237, 0.644],
[0.032, 0.087, 0.237, 0.644],
[0.032, 0.087, 0.237, 0.644]])
a_softmax=tensor([0.032, 0.087, 0.237, 0.644])
b_softmax=tensor([0.032, 0.087, 0.237, 0.644])
c_softmax=tensor([0.032, 0.087, 0.237, 0.644])
soft_dim1=
tensor([[ 0.000, 0.000, 0.000, 0.000],
[ 0.018, 0.018, 0.018, 0.018],
[ 0.982, 0.982, 0.982, 0.982]])
d_softmax=tensor([ 0.000, 0.018, 0.982])
d2_softmax=tensor([ 0.000, 0.018, 0.982])
layer_norma_eps=
1e-05
layer_norma_weight=
Parameter containing:
tensor([1., 1., 1., 1.], requires_grad=True)
layer_matrix=
tensor([[-1.342, -0.447, 0.447, 1.342],
[-1.342, -0.447, 0.447, 1.342],
[-1.342, -0.447, 0.447, 1.342]], grad_fn=<NativeLayerNormBackward0>)
mean_a=
tensor([[1.500],
[5.500],
[9.500]])
std_a=
tensor([[1.118],
[1.118],
[1.118]])
my_layer=
tensor([[-1.342, -0.447, 0.447, 1.342],
[-1.342, -0.447, 0.447, 1.342],
[-1.342, -0.447, 0.447, 1.342]])
a1=
tensor([0., 1., 2., 3.])
a1_mean=1.5
a1_var=1.1180340051651
var1=1.118033988749895