Pytorch 复习总结 4

Pytorch 复习总结，仅供笔者使用，参考教材：

《动手学深度学习》
Stanford University: Practical Machine Learning

本文主要内容为：Pytorch 深度学习计算。

本文先介绍了深度学习中自定义层和块的方法，然后介绍了一些有关参数的管理和读写方法，最后介绍了 GPU 的使用方法。

Pytorch 语法汇总：

Pytorch 张量的常见运算、线性代数、高等数学、概率论部分见 Pytorch 复习总结1；
Pytorch 线性神经网络部分见 Pytorch 复习总结2；
Pytorch 多层感知机部分见 Pytorch 复习总结3；
Pytorch 深度学习计算部分见 Pytorch 复习总结4；
Pytorch 卷积神经网络部分见 Pytorch 复习总结5；
Pytorch 现代卷积神经网络部分见 Pytorch 复习总结6；

层是神经网络的基本组成单元，如全连接层、卷积层、池化层等。块是由层组成的更大的功能单元，用于构建复杂的神经网络结构。块可以是一系列相互关联的层，形成一个功能完整的单元，也可以是一组层的重复模式，用于实现重复的结构。下图就是多个层组合成块形成的更大模型：
在这里插入图片描述

在实际应用中，经常会需要自定义层和块。

一. 自定义块

1. 顺序块

nn.Sequential 本质上就是一个顺序块，通过在块中实例化层来创建神经网络。 nn.Module 是 PyTorch 中用于构建神经网络模型的基类，nn.Sequential 和各种层都是继承自 Module，nn.Sequential 维护一个由多个层组成的有序列表，列表中的每个层连接在一起，将每个层的输出作为下一个层的输入。

如果想要自定义一个顺序块，必须要定义以下两个关键函数：

构造函数：将每个层按顺序逐个加入列表；
前向传播函数：将每一层按顺序传递给下一层；

import torch
from torch import nn

class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            self._modules[str(idx)] = module

    def forward(self, X):
        # self._modules的类型是OrderedDict
        for block in self._modules.values():
            X = block(X)
        return X

net = MySequential(
    nn.Linear(20, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
)

X = torch.rand(2, 20)
output = net(X)

上述示例代码中，定义 net 时会自动调用 __init__(self, *args) 函数，实例化 MySequential 对象；调用 net(X) 相当于 net.__call__(X)，会自动调用模型类中定义的 forward() 函数，进行前向传播，每一层的传播本质上就是调用 block(X) 的过程。

2. 自定义前向传播

nn.Sequential 类将前向传播过程封装成函数，用户可以自由使用但没法修改传播细节。如果想要自定义前向传播过程中的细节，就需要自定义顺序块及 forward 函数，而不能仅仅依赖预定义的框架。

例如，需要一个计算函数 $f(\bold x,\bold w)=c \cdot \bold w ^T \bold x$ 的层，并且在传播过程中引入控制流。其中 $\bold x$ 是输入， $\bold w$ 是参数， $c$ 是优化过程中不需要更新的指定常量。为此，定义 FixedHiddenMLP 类如下：

import torch
from torch import nn
from torch.nn import functional as F

class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.rand_weight = torch.rand((20, 20), requires_grad=False)    # 优化过程中不需要更新的指定常量
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        X = self.linear(X)          # 两个全连接层共享参数
        while X.abs().sum() > 1:    # 控制流
            X /= 2
        return X

3. 嵌套块

多个层可以组合成块，多个块还可以嵌套形成更大的模型：

import torch
from torch import nn
from torch.nn import functional as F

class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.rand_weight = torch.rand((20, 20), requires_grad=False)    # 优化过程中不需要更新的指定常量
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        X = self.linear(X)          # 两个全连接层共享参数
        while X.abs().sum() > 1:    # 控制流
            X /= 2
        return X.sum()

class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(20, 64), nn.ReLU(),
                                 nn.Linear(64, 32), nn.ReLU())
        self.linear = nn.Linear(32, 16)

    def forward(self, X):
        return self.linear(self.net(X))

net = nn.Sequential(
    NestMLP(), 
    nn.Linear(16, 20), 
    FixedHiddenMLP()
)

X = torch.rand(2, 20)
output = net(X)

二. 自定义层

和自定义块一样，自定义层也需要实现构造函数和前向传播函数。

1. 无参数层

import torch
from torch import nn

class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()

net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())
X = torch.rand(4, 8)
output = net(X)
print(output.mean())	# tensor(0., grad_fn=<MeanBackward0>)

2. 有参数层

import torch
from torch import nn
import torch.nn.functional as F

class MyLinear(nn.Module):
    def __init__(self, in_units, out_units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, out_units))
        self.bias = nn.Parameter(torch.randn(out_units,))
    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

net = nn.Sequential(
    MyLinear(64, 8), 
    MyLinear(8, 1)
)
X = torch.rand(2, 64)
output = net(X)
print(output)       # tensor([[11.9497], [13.9729]])

三. 参数管理

在实验过程中，有时需要提取参数，以便检查或在其他环境中复用。本节将介绍参数的访问方法和参数的初始化。

1. 参数访问

net.state_dict() / net[i].state_dict()：返回模型或某一层参数的状态字典；
net[i].weight.data / net[i].bias.data：返回某一层的权重 / 偏置参数；
net[i].weight.grad：返回某一层的权重参数的梯度属性。只有调用了 backward() 方法后才能访问到梯度值，否则为 None；

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
output = net(X)

print(net.state_dict())
'''
OrderedDict([('0.weight', tensor([[ 0.2178, -0.3286,  0.4875, -0.0347],
        [-0.0415,  0.0009, -0.2038, -0.1813],
        [-0.2766, -0.4759, -0.3134, -0.2782],
        [ 0.4854,  0.0606,  0.1070,  0.0650],
        [-0.3908,  0.2412, -0.1348,  0.3921],
        [-0.3044, -0.0331, -0.1213, -0.1690],
        [-0.3875, -0.0117,  0.3195, -0.1748],
        [ 0.1840, -0.3502,  0.4253,  0.2789]])), ('0.bias', tensor([-0.2327, -0.0745,  0.4923, -0.1018,  0.0685,  0.4423, -0.2979,  0.1109])), ('2.weight', tensor([[ 0.1006,  0.2959, -0.1316, -0.2015,  0.2446, -0.0158,  0.2217, -0.2780]])), ('2.bias', tensor([0.2362]))])
'''
print(net[2].state_dict())
'''
OrderedDict([('weight', tensor([[ 0.1006,  0.2959, -0.1316, -0.2015,  0.2446, -0.0158,  0.2217, -0.2780]])), ('bias', tensor([0.2362]))])
'''
print(net[2].bias)
'''
Parameter containing:
tensor([0.2362], requires_grad=True)
'''
print(net[2].bias.data)
'''
tensor([0.2362])
'''

如果想一次性访问所有参数，可以使用 for 循环递归遍历：

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
output = net(X)

print(*[(name, param.data) for name, param in net[0].named_parameters()])
'''
('weight', tensor([[-0.0273, -0.4942, -0.0880,  0.3169],
        [ 0.2205,  0.3344, -0.4425, -0.0882],
        [ 0.1726, -0.0007, -0.0256, -0.0593],
        [-0.3854, -0.0934, -0.4641,  0.1950],
        [ 0.2358, -0.4820, -0.2315,  0.1642],
        [-0.2645,  0.2021,  0.3167, -0.0042],
        [ 0.1714, -0.2201, -0.3326, -0.2908],
        [-0.3196,  0.0584, -0.1059,  0.0256]])) ('bias', tensor([ 0.3285,  0.4167, -0.2343,  0.3099,  0.1576, -0.0397, -0.2190, -0.3854]))
'''
print(*[(name, param.shape) for name, param in net.named_parameters()])
'''
('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1]))
'''

如果网络是由多个块相互嵌套的，可以按块索引后再访问参数：

import torch
from torch import nn

def block1():
    return nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
                         nn.Linear(8, 4), nn.ReLU())

def block2():
    net = nn.Sequential()
    for i in range(4):
        net.add_module(f'block {i}', block1())
    return net

net = nn.Sequential(block2(), nn.Linear(4, 1))
X = torch.rand(size=(2, 4))
output = net(X)

print(net)
'''
Sequential(
  (0): Sequential(
    (block 0): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 1): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 2): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 3): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
  )
  (1): Linear(in_features=4, out_features=1, bias=True)
)
'''
print(net[0][1][0].bias.data)
'''
tensor([-0.0083,  0.2490,  0.1794,  0.1927,  0.1797,  0.1156,  0.4409,  0.1320])
'''

2. 参数初始化

PyTorch 的 nn.init 模块提供了多种初始化方法：

nn.init.constant_(layer.weight, c)：将权重参数初始化为指定的常量值；
nn.init.zeros_(layer.weight)：将权重参数初始化为 0；
nn.init.ones_(layer.weight)：将权重参数初始化为 1；
nn.init.uniform_(layer.weight, a, b)：将权重参数按均匀分布初始化；
nn.init.xavier_uniform_(layer.weight)：
nn.init.normal_(layer.weight, mean, std)：将权重参数按正态分布初始化；
nn.init.orthogonal_(layer.weight)：将权重参数初始化为正交矩阵；
nn.init.sparse_(layer.weight, sparsity, std)：将权重参数初始化为稀疏矩阵；

初始化时，可以直接 net.apply(init_method) 初始化整个网络，也可以 net[i].apply(init_method) 初始化某一层：

import torch
from torch import nn

def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, mean=0, std=0.01)
        nn.init.zeros_(m.bias)

def init_constant(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 1)
        nn.init.zeros_(m.bias)

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
output = net(X)

# net.apply(init_normal)
net[0].apply(init_normal)
net[2].apply(init_constant)

3. 延后初始化

有些情况下，无法提前判断网络的输入维度。为了代码能够继续运行，需要使用延后初始化，即直到数据第一次通过模型传递时，框架才会动态地推断出每个层的大小。由于 PyTorch 的延后初始化功能还处于开发阶段，API 和功能随时可能变化，下面只给出简单示例：

import torch
from torch import nn

net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))

print(net)
'''
Sequential(
  (0): LazyLinear(in_features=0, out_features=256, bias=True)
  (1): ReLU()
  (2): LazyLinear(in_features=0, out_features=10, bias=True)
)
'''

X = torch.rand(2, 20)
net(X)
print(net)
'''
Sequential(
  (0): Linear(in_features=20, out_features=256, bias=True)
  (1): ReLU()
  (2): Linear(in_features=256, out_features=10, bias=True)
)
'''

四. 文件读写

可以使用 torch.load(file) 和 torch.save(x, file) 函数读写张量和模型参数。

1. 加载和保存张量

只要保证读写格式一致即可：

import torch
from torch import nn
from torch.nn import functional as F

x = torch.arange(4)
y = torch.zeros(4)
torch.save([x, y],'xy.pth')

x2, y2 = torch.load('xy.pth')
print(x2, y2)       # tensor([0, 1, 2, 3]) tensor([0., 0., 0., 0.])

保存张量的文件格式没有要求，甚至可以没有后缀名。因为 torch.save(x, file) 函数本质上是使用 Python 的 pickle 模块来序列化对象并将其保存到文件中的，pickle 模块负责将 Python 对象转换为字节流，而文件的扩展名本身并不影响 pickle 模块的工作。

2. 加载和保存模型参数

因为模型一般是自定义的类，所以加载模型前要先实例化一个相同类别的变量，再将模型参数加载到该变量中：

import torch
from torch import nn
from torch.nn import functional as F

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 2)
        self.output = nn.Linear(2, 3)

    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

net = MLP()
X = torch.randn(size=(2, 4))
Y = net(X)
print(net.state_dict())
'''
OrderedDict([('hidden.weight', tensor([[-0.0154, -0.3586, -0.3653, -0.2950],
        [ 0.2591, -0.2563,  0.3833,  0.1449]])), ('hidden.bias', tensor([0.1884, 0.3998])), ('output.weight', tensor([[-0.4805,  0.4077],
        [-0.0933,  0.0584],
        [ 0.3114,  0.6285]])), ('output.bias', tensor([-0.2552, -0.6520,  0.3290]))])
'''

torch.save(net.state_dict(), 'mlp.pth')

net2 = MLP()
net2.load_state_dict(torch.load('mlp.pth'))
print(net2.state_dict())
'''
OrderedDict([('hidden.weight', tensor([[-0.0154, -0.3586, -0.3653, -0.2950],
        [ 0.2591, -0.2563,  0.3833,  0.1449]])), ('hidden.bias', tensor([0.1884, 0.3998])), ('output.weight', tensor([[-0.4805,  0.4077],
        [-0.0933,  0.0584],
        [ 0.3114,  0.6285]])), ('output.bias', tensor([-0.2552, -0.6520,  0.3290]))])
'''

也可以只保存单层参数：

import torch
from torch import nn
from torch.nn import functional as F

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 2)
        self.output = nn.Linear(2, 3)

    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

net = MLP()
X = torch.randn(size=(2, 4))
Y = net(X)
print(net.state_dict())
'''
OrderedDict([('hidden.weight', tensor([[-0.2937,  0.1589,  0.2349,  0.1130],
        [ 0.4170,  0.2699,  0.3760,  0.0201]])), ('hidden.bias', tensor([ 0.3914, -0.1185])), ('output.weight', tensor([[0.0884, 0.2572],
        [0.1547, 0.0164],
        [0.3386, 0.5151]])), ('output.bias', tensor([-0.5032, -0.2515, -0.4531]))])
'''

torch.save(net.hidden.state_dict(), 'mlp.pth')

net2 = MLP()
net2.hidden.load_state_dict(torch.load('mlp.pth'))
print(net2.state_dict())
'''
OrderedDict([('hidden.weight', tensor([[-0.2937,  0.1589,  0.2349,  0.1130],
        [ 0.4170,  0.2699,  0.3760,  0.0201]])), ('hidden.bias', tensor([ 0.3914, -0.1185])), ('output.weight', tensor([[ 0.2318,  0.3837],
        [ 0.2380,  0.6463],
        [-0.6014,  0.3717]])), ('output.bias', tensor([-0.3154, -0.0078, -0.2676]))])
'''

五. GPU 计算

在 PyTorch 中，CPU 和 GPU 分别可以用 torch.device('cpu') 和 torch.device('cuda') 表示。如果有多个 GPU，可以使用 torch.device(fcuda:i') 来表示，cuda:0 和 cuda 等价。可以查询所有可用 GPU 也可以指定 GPU：

import torch

def try_gpu(i=0):
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{i}')
    return torch.device('cpu')

def try_all_gpus():
    devices = [torch.device(f'cuda:{i}')
             for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

张量和网络模型都可以通过 .cuda(device) 函数移动到 GPU 上：

import torch
from torch import nn

net = nn.Sequential(
    nn.Linear(2, 64),
    nn.ReLU(),
    nn.Linear(64, 32)
)
X = torch.randn(size=(4, 2))
loss = nn.MSELoss()

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

net.cuda(device=device)
X.cuda(device=device)
loss.cuda(device=device)