问题1：更改模型最后一层，删除最后一层，添加层。

改变模型最后一层

# Load the model
model = models.resnet18(pretrained = False)

# Get number of parameters going in to the last layer. we need this to change the final layer. 
num_final_in = model.fc.in_features

# The final layer of the model is model.fc so we can basically just overwrite it 
#to have the output = number of classes we need. Say, 300 classes.
NUM_CLASSES = 300
model.fc = nn.Linear(num_final_in, NUM_CLASSES)

若有些网络的最后一层不是FC层，那么我们可以先去获取最后一层的层名，再根据层名进行替换

# Load the model
model = models.resnet18(pretrained = False)

# 打印所有层的层名
for name, module in model.named_modules():
    print(name)

删除最后一层

我们可以像以前一样使用 model.children() 来获取层。然后，我们可以通过在其上使用 list() 命令将其转换为列表。然后，我们可以通过索引列表来删除最后一层。最后，我们可以使用 PyTorch 函数 nn.Sequential() 将这个修改后的列表一起堆叠到一个新模型中。可以以任何你想要的方式编辑列表。也就是说，如果你想要倒数第 3 层图像的特征，你可以删除最后 2 层！

甚至可以从模型中间删除层。但很明显，这会导致进入其后层的特征数量不正确，因为大多数层都会改变图像的大小。在这种情况下，你可以索引模型的特定层并覆盖它！

# Load the model
model = models.resnet18(pretrained = False)

new_model = nn.Sequential(*list(model.children())[:-1])

# 获取倒数第3层
new_model_2_removed = nn.Sequential(*list(model.children())[:-2])

添加图层

比如说，想向我们现在拥有的模型添加一个全连接的层。一种明显的方法是编辑我上面讨论的列表并向其附加另一层。然而，通常我们训练了这样一个模型，并想看看我们是否可以加载该模型，并在其之上添加一个新层。如上所述，加载的模型应该与保存的模型具有相同的体系结构，因此我们不能使用列表方法。

我们需要在上面添加层。在 PyTorch 中执行此操作的方法很简单——我们只需要创建一个自定义模型！这将我们带到下一节 - 创建自定义模型！

自定义模型

让我们制作一个自定义模型。如上所述，我们将从预训练网络加载一半模型。这看起来很复杂，对吧？模型的一半是经过训练的，一半是新的。此外，我们希望其中一些被冻结。有些是可更新的。一旦你完成了这个，你就可以在 PyTorch 中对模型架构做任何事情。

# Some imports first
import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo
import torch
from torch.autograd.variable import Variable
from torchvision import datasets, models, transforms

# New models are defined as classes. Then, when we want to create a model we create an object instantiating this class.
class Resnet_Added_Layers_Half_Frozen(nn.Module):
    def __init__(self, LOAD_VIS_URL=None):
        super(ResnetCombinedFull2, self).__init__()
    
         # Start with half the resnet model, swap out the final layer because that's the model we had defined above. 
        model = models.resnet18(pretrained = False)
        num_final_in = model.fc.in_features
        model.fc = nn.Linear(num_final_in, 300)
        
        # Now that the architecture is defined same as above, let's load the model we would have trained above. 
        checkpoint = torch.load(MODEL_PATH)
        model.load_state_dict(checkpoint)
        
        
        # Let's freeze the same as above. Same code as above without the print statements
        child_counter = 0
        for child in model.children():
            if child_counter < 6:
                for param in child.parameters():
                    param.requires_grad = False
            elif child_counter == 6:
                children_of_child_counter = 0
                for children_of_child in child.children():
                    if children_of_child_counter < 1:
                        for param in children_of_child.parameters():
                            param.requires_grad = False
                    else:
                        children_of_child_counter += 1

            else:
                print("child ",child_counter," was not frozen")

            child_counter += 1
        
        # Now, let's define new layers that we want to add on top. 
        # Basically, these are just objects we define here. The "adding on top" is defined by the forward()
        # function which decides the flow of the input data into the model.
        
        # NOTE - Even the above model needs to be passed to self.
        self.vismodel = nn.Sequential(*list(model.children()))
        self.projective = nn.Linear(512,400)
        self.nonlinearity = nn.ReLU(inplace=True)
        self.projective2 = nn.Linear(400,300)
        
    
    # The forward function defines the flow of the input data and thus decides which layer/chunk goes on top of what.
    def forward(self,x):
        x = self.vismodel(x)
        x = torch.squeeze(x)
        x = self.projective(x)
        x = self.nonlinearity(x)
        x = self.projective2(x)
        return x

自定义损失函数

现在我们已经有了我们的模型，我们可以加载任何东西并创建我们想要的任何架构。这给我们留下了任何管道中的 2 个重要组件 - 加载数据和训练部分。我们来看看训练部分。这一步最重要的两个组成部分是优化器和损失函数。损失函数量化了我们现有模型与我们想要达到的目标之间的距离，优化器决定如何更新参数，以便我们可以最大限度地减少损失。

有时，我们需要定义自己的损失函数。这里有一些事情要知道

自定义损失函数也是使用自定义类定义的。它们像自定义模型一样继承自 torch.nn.Module。
通常，我们需要更改其中一项输入的维度。这可以使用 view() 函数来完成。
如果我们想为张量添加维度，请使用 unsqueeze() 函数。
损失函数最终返回的值必须是标量值。不是矢量/张量。
返回的值必须是一个变量。这样它就可以用于更新参数。最好的方法是确保传入的 x 和 y 都是变量。这样，两者的任何函数也将是一个变量。
Pytorch 变量只是一个 Pytorch 张量，但 Pytorch 正在跟踪对其进行的操作，以便它可以反向传播以获得梯度。

这里我展示了一个名为 Regress_Loss 的自定义损失，它将 2 种输入 x 和 y 作为输入。然后将 x 重塑为与 y 相似，最后通过计算重塑后的 x 和 y 之间的 L2 差来返回损失。这是你在训练网络中经常遇到的标准事情。

将 x 视为形状 (5,10)，将 y 视为形状 (5,5,10)。所以，我们需要给 x 添加一个维度，然后沿着添加的维度重复它以匹配 y 的维度。然后，(xy) 将是形状 (5,5,10)。我们必须将所有三个维度相加，即三个 torch.sum() 以获得标量。

该操作经常遇到，和numpy中的广播机制一致，需要掌握

# 
class Regress_Loss(torch.nn.Module):
    
    def __init__(self):
        super(Regress_Loss,self).__init__()
        
    def forward(self,x,y):
        y_shape = y.size()[1]
        x_added_dim = x.unsqueeze(1)
        x_stacked_along_dimension1 = x_added_dim.repeat(1, NUM_WORDS, 1)
        diff = torch.sum((y - x_stacked_along_dimension1)**2, 2)
        totloss = torch.sum(torch.sum(torch.sum(diff)))
        return totloss