模型初始化

news2025/4/25 3:48:23

在深度学习模型训练中，权重初始值极为重要，一个好的初始值会使得模型收敛速度提高，使模型准确率更准确，一般情况下，我们不使用全零初始值训练网络，为了利于训练和减少收敛时间，我们需要对模型进行合理的初始化， $P y t orc h$ 也在 $t orc h . nn . ini t$ 中为我们提供了常用的初始化方法，通过本章学习，你将学习到以下内容。

常见的初始化函数。
初始化函数的使用。

torch.nn.init的内容

我们发现初始化模块提供了以下的初始化方法：

torch.nn.init.uniform_(tensor, a=0.0, b=1.0)
torch.nn.init.normal_(tensor, mean=0.0, std=1.0) 3 . *
torch.nn.init.constant_(tensor, val) 4 .
torch.nn.init.ones_(tensor) 5
torch.nn.init.zeros_(tensor)
torch.nn.init.eye_(tensor)
torch.nn.init.dirac_(tensor, groups=1)
torch.nn.init.xavier_uniform_(tensor, gain=1.0)
torch.nn.init.xavier_normal_(tensor, gain=1.0)
torch.nn.init.kaiming_uniform_(tensor, a=0, mode=‘fan__in’, nonlinearity=‘leaky_relu’)
torch.nn.init.kaiming_normal_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)
torch.nn.init.orthogonal_(tensor, gain=1)
torch.nn.init.sparse_(tensor, sparsity, std=0.01)
torch.nn.init.calculate_gain(nonlinearity, param=None)

具体分布解释

均匀分布

torch.nn.init.uniform_(tensor, a=0.0, b=1.0)
- tensor – an n-dimensional torch.Tensor
- a – the lower bound of the uniform distribution
- b – the upper bound of the uniform distribution

高斯分布

torch.nn.init.normal_(tensor, mean=0.0, std=1.0)
* tensor – an n-dimensional torch.Tensor
* mean – the mean of the normal distribution
* std – the standard deviation of the normal distribution

初始化为常数

torch.nn.init.constant_(tensor, val)
- tensor – an n-dimensional torch.Tensor
- val – the value to fill the tensor with

初始化全为1

torch.nn.init.ones_(tensor)
- tensor – an n-dimensional torch.Tensor

初始化全为0

torch.nn.init.zeros_(tensor)
- tensor – an n-dimensional torch.Tensor

初始化为对角单位矩阵

torch.nn.init.eye_(tensor)
- tensor – a 2-dimensional torch.Tensor

.Xavier 均匀分布

torch.nn.init.xavier_uniform_(tensor, gain=1.0)
- tensor – an n-dimensional torch.Tensor
- gain – an optional scaling factor

.Xavier 高斯分布

在这里插入图片描述

torch.nn.init.xavier_normal_(tensor, gain=1.0)
- tensor – an n-dimensional torch.Tensor
- gain – an optional scaling factor

He 均匀分布

在这里插入图片描述

torch.nn.init.kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)
- tensor – an n-dimensional torch.Tensor
- a – the negative slope of the rectifier used after this layer (only used with ‘leaky_relu’

He 高斯分布

torch.nn.init.kaiming_normal_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)

初始化函数的使用

初始化函数的封装

def initialize_weights(self):
	for m in self.modules():
		# 判断是否属于Conv2d
		if isinstance(m, nn.Conv2d):
			torch.nn.init.xavier_normal_(m.weight.data)
			# 判断是否有偏置
			if m.bias is not None:
				torch.nn.init.constant_(m.bias.data,0.3)
		elif isinstance(m, nn.Linear):
			torch.nn.init.normal_(m.weight.data, 0.1)
			if m.bias is not None:
				torch.nn.init.zeros_(m.bias.data)
		elif isinstance(m, nn.BatchNorm2d):
			m.weight.data.fill_(1) 		 
			m.bias.data.zeros_()

模型定义，调用初始化函数。

# 模型的定义
class MLP(nn.Module):
  # 声明带有模型参数的层，这里声明了两个全连接层
  def __init__(self, **kwargs):
    # 调用MLP父类Block的构造函数来进行必要的初始化。这样在构造实例时还可以指定其他函数
    super(MLP, self).__init__(**kwargs)
    self.hidden = nn.Conv2d(1,1,3)
    self.act = nn.ReLU()
    self.output = nn.Linear(10,1)
    
   # 定义模型的前向计算，即如何根据输入x计算返回所需要的模型输出
  def forward(self, x):
    o = self.act(self.hidden(x))
    return self.output(o)

mlp = MLP()
print(list(mlp.parameters()))
print("-------初始化-------")

initialize_weights(mlp)
print(list(mlp.parameters()))