L1Loss 计算预测值 ypred 和真实值 ytrue 之间的平均绝对误差(MAE),公式为 L ( y p r e d , y t r u e ) = 1 n ∑ i = 1 n ∣ y p r e d i − y t r u e i ∣ L(y_{pred},y_{true})=\frac1n\sum^n_{i=1}|y^i_{pred}-y^i_{true}| L(ypred,ytrue)=n1∑i=1n∣ypredi−ytruei∣。常用于回归问题,对异常值相对不敏感。梯度在零点处不连续,可能影响优化过程。
import torch
import torch.nn as nn
criterion = nn.L1Loss()
y_pred = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y_true = torch.tensor([1.2, 2.2, 3.2])
loss = criterion(y_pred, y_true)
print(loss.item())
MSELoss 计算预测值和真实值之间的均方误差(MSE),公式为 ( L ( y p r e d , y t r u e ) = 1 n ∑ i = 1 n ( y p r e d i − y t r u e i ) 2 L(y_{pred}, y_{true}) = \frac{1}{n}\sum_{i=1}^{n}(y_{pred}^i - y_{true}^i)^2 L(ypred,ytrue)=n1∑i=1n(ypredi−ytruei)2)。回归问题,对异常值比较敏感。梯度在误差较大时较大,可能导致优化不稳定。
import torch
import torch.nn as nn
criterion = nn.MSELoss()
y_pred = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y_true = torch.tensor([1.2, 2.2, 3.2])
loss = criterion(y_pred, y_true)
print(loss.item())
SmoothL1Loss 结合了 L1Loss
和 MSELoss
的优点,在误差较小时使用 MSELoss
,误差较大时使用 L1Loss
。回归问题,对异常值有一定的鲁棒性。平滑因子 beta
需要根据具体问题调整。
import torch
import torch.nn as nn
criterion = nn.SmoothL1Loss()
y_pred = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y_true = torch.tensor([1.2, 2.2, 3.2])
loss = criterion(y_pred, y_true)
print(loss.item())
HuberLoss 与 SmoothL1Loss
类似,也是一种平滑的 L1Loss
。回归问题,对异常值有一定的鲁棒性。需要设置 delta
参数。
import torch
import torch.nn as nn
criterion = nn.HuberLoss()
y_pred = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y_true = torch.tensor([1.2, 2.2, 3.2])
loss = criterion(y_pred, y_true)
print(loss.item())
NLLLoss 负对数似然损失,用于多分类问题。假设 (
y
p
r
e
d
y_{pred}
ypred) 是对数概率,(
y
t
r
u
e
y_{true}
ytrue) 是真实标签,公式为 (
L
(
y
p
r
e
d
,
y
t
r
u
e
)
=
−
y
p
r
e
d
[
y
t
r
u
e
]
L(y_{pred}, y_{true}) = -y_{pred}[y_{true}]
L(ypred,ytrue)=−ypred[ytrue])。多分类问题,输入需要是对数概率。输入必须是对数概率,通常结合 LogSoftmax
使用。
import torch
import torch.nn as nn
criterion = nn.NLLLoss()
y_pred = torch.tensor([[-0.1, -0.2, -0.3], [-0.4, -0.5, -0.6]], requires_grad=True)
y_true = torch.tensor([0, 1])
loss = criterion(y_pred, y_true)
print(loss.item())
NLLLoss2d 与 NLLLoss
类似,但用于二维图像数据的多分类问题。输入和标签的维度需要匹配。
import torch
import torch.nn as nn
criterion = nn.NLLLoss2d()
y_pred = torch.randn(3, 5, 20, 20, requires_grad=True)
y_true = torch.empty(3, 20, 20, dtype=torch.long).random_(5)
loss = criterion(y_pred, y_true)
print(loss.item())
PoissonNLLLoss 基于泊松分布的负对数似然损失,公式为 (
L
(
y
p
r
e
d
,
y
t
r
u
e
)
=
exp
(
y
p
r
e
d
)
−
y
t
r
u
e
⋅
y
p
r
e
d
L(y_{pred}, y_{true}) = \exp(y_{pred}) - y_{true} \cdot y_{pred}
L(ypred,ytrue)=exp(ypred)−ytrue⋅ypred)。计数数据的回归问题,如事件发生次数的预测。输入可以是原始值或对数概率,需要根据 log_input
参数调整。
import torch
import torch.nn as nn
criterion = nn.PoissonNLLLoss()
y_pred = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y_true = torch.tensor([1.2, 2.2, 3.2])
loss = criterion(y_pred, y_true)
print(loss.item())
GaussianNLLLoss 基于高斯分布的负对数似然损失,假设预测值是均值和方差,公式为 ( L ( y p r e d , y t r u e ) = 1 2 ( log ( σ 2 ) + ( y t r u e − μ ) 2 σ 2 ) L(y_{pred}, y_{true}) = \frac{1}{2}\left(\log(\sigma^2) + \frac{(y_{true} - \mu)^2}{\sigma^2}\right) L(ypred,ytrue)=21(log(σ2)+σ2(ytrue−μ)2))。回归问题,假设数据服从高斯分布。需要同时输入均值和方差。
import torch
import torch.nn as nn
criterion = nn.GaussianNLLLoss()
y_pred_mean = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y_pred_var = torch.tensor([0.1, 0.2, 0.3])
y_true = torch.tensor([1.2, 2.2, 3.2])
loss = criterion(y_pred_mean, y_true, y_pred_var)
print(loss.item())
KLDivLoss KL 散度损失,用于衡量两个概率分布之间的差异,公式为 ( D K L ( P ∣ ∣ Q ) = ∑ i P ( i ) log ( P ( i ) Q ( i ) ) D_{KL}(P || Q) = \sum_{i} P(i) \log\left(\frac{P(i)}{Q(i)}\right) DKL(P∣∣Q)=∑iP(i)log(Q(i)P(i)) )。概率分布的匹配问题,如生成模型。输入需要是对数概率,标签需要是概率分布。
import torch
import torch.nn as nn
criterion = nn.KLDivLoss(reduction='batchmean')
y_pred = torch.log(torch.tensor([[0.1, 0.9], [0.2, 0.8]], requires_grad=True))
y_true = torch.tensor([[0.2, 0.8], [0.3, 0.7]])
loss = criterion(y_pred, y_true)
print(loss.item())
BCELoss 二元交叉熵损失,用于二分类问题,公式为 (
L
(
y
p
r
e
d
,
y
t
r
u
e
)
=
−
[
y
t
r
u
e
log
(
y
p
r
e
d
)
+
(
1
−
y
t
r
u
e
)
log
(
1
−
y
p
r
e
d
)
]
L(y_{pred}, y_{true}) = -[y_{true} \log(y_{pred}) + (1 - y_{true}) \log(1 - y_{pred})]
L(ypred,ytrue)=−[ytruelog(ypred)+(1−ytrue)log(1−ypred)])。二分类问题,输入需要是概率值。输入需要在 [0, 1] 范围内,通常结合 Sigmoid
使用。
import torch
import torch.nn as nn
criterion = nn.BCELoss()
y_pred = torch.tensor([0.1, 0.9], requires_grad=True)
y_true = torch.tensor([0, 1])
loss = criterion(y_pred, y_true)
print(loss.item())
BCEWithLogitsLoss 结合了 Sigmoid
和 BCELoss
,避免了数值不稳定问题。二分类问题,输入可以是原始值。输入不需要经过 Sigmoid
激活。
import torch
import torch.nn as nn
criterion = nn.BCEWithLogitsLoss()
y_pred = torch.tensor([-1.0, 1.0], requires_grad=True)
y_true = torch.tensor([0, 1])
loss = criterion(y_pred, y_true)
print(loss.item())
TripletMarginLoss 用于三元组损失,公式为 (
L
(
a
,
p
,
n
)
=
max
(
d
(
a
,
p
)
−
d
(
a
,
n
)
+
m
a
r
g
i
n
,
0
)
L(a, p, n) = \max(d(a, p) - d(a, n) + margin, 0)
L(a,p,n)=max(d(a,p)−d(a,n)+margin,0)),其中 a 是锚点,p 是正样本,n 是负样本。相似度学习,如人脸识别。需要设置 margin
参数。
import torch
import torch.nn as nn
criterion = nn.TripletMarginLoss()
anchor = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
positive = torch.tensor([4.0, 5.0, 6.0])
negative = torch.tensor([7.0, 8.0, 9.0])
loss = criterion(anchor, positive, negative)
print(loss.item())
TripletMarginWithDistanceLoss 与 TripletMarginLoss
类似,但可以自定义距离函数。相似度学习,如人脸识别。需要自定义距离函数。
import torch
import torch.nn as nn
def custom_distance(x1, x2):
return torch.norm(x1 - x2, p=2)
criterion = nn.TripletMarginWithDistanceLoss(distance_function=custom_distance)
anchor = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
positive = torch.tensor([4.0, 5.0, 6.0])
negative = torch.tensor([7.0, 8.0, 9.0])
loss = criterion(anchor, positive, negative)
print(loss.item())
HingeEmbeddingLoss 用于度量两个向量之间的相似性,公式为 (
L
(
y
p
r
e
d
,
y
t
r
u
e
)
=
{
y
p
r
e
d
,
if
y
t
r
u
e
=
1
max
(
0
,
m
a
r
g
i
n
−
y
p
r
e
d
)
,
if
y
t
r
u
e
=
−
1
L(y_{pred}, y_{true}) = \begin{cases} y_{pred}, & \text{if } y_{true} = 1 \\ \max(0, margin - y_{pred}), & \text{if } y_{true} = -1 \end{cases}
L(ypred,ytrue)={ypred,max(0,margin−ypred),if ytrue=1if ytrue=−1)。相似度学习,如人脸识别。需要设置 margin
参数。
import torch
import torch.nn as nn
criterion = nn.HingeEmbeddingLoss()
y_pred = torch.tensor([1.0, -2.0, 3.0], requires_grad=True)
y_true = torch.tensor([1, -1, 1])
loss = criterion(y_pred, y_true)
print(loss.item())
CosineEmbeddingLoss 用于度量两个向量之间的余弦相似度,公式为 (
L
(
y
p
r
e
d
,
y
t
r
u
e
)
=
{
1
−
cos
(
y
p
r
e
d
1
,
y
p
r
e
d
2
)
,
if
y
t
r
u
e
=
1
max
(
0
,
cos
(
y
p
r
e
d
1
,
y
p
r
e
d
2
)
−
m
a
r
g
i
n
)
,
if
y
t
r
u
e
=
−
1
L(y_{pred}, y_{true}) = \begin{cases} 1 - \cos(y_{pred}^1, y_{pred}^2), & \text{if } y_{true} = 1 \\ \max(0, \cos(y_{pred}^1, y_{pred}^2) - margin), & \text{if } y_{true} = -1 \end{cases}
L(ypred,ytrue)={1−cos(ypred1,ypred2),max(0,cos(ypred1,ypred2)−margin),if ytrue=1if ytrue=−1)。相似度学习,如文本相似度。需要设置 margin
参数。
import torch
import torch.nn as nn
criterion = nn.CosineEmbeddingLoss()
y_pred1 = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y_pred2 = torch.tensor([4.0, 5.0, 6.0])
y_true = torch.tensor([1])
loss = criterion(y_pred1, y_pred2, y_true)
print(loss.item())
SoftMarginLoss 用于二分类问题,基于软间隔支持向量机的思想,公式为 ( L ( y p r e d , y t r u e ) = log ( 1 + exp ( − y t r u e ⋅ y p r e d ) ) L(y_{pred}, y_{true}) = \log(1 + \exp(-y_{true} \cdot y_{pred})) L(ypred,ytrue)=log(1+exp(−ytrue⋅ypred)))。二分类问题。输入可以是原始值。
import torch
import torch.nn as nn
criterion = nn.SoftMarginLoss()
y_pred = torch.tensor([1.0, -2.0], requires_grad=True)
y_true = torch.tensor([1, -1])
loss = criterion(y_pred, y_true)
print(loss.item())
MultiLabelMarginLoss 用于多标签分类问题,公式为 (
L
(
y
p
r
e
d
,
y
t
r
u
e
)
=
1
n
∑
i
∑
j
max
(
0
,
1
−
(
y
p
r
e
d
[
y
t
r
u
e
i
]
−
y
p
r
e
d
[
j
]
)
)
L(y_{pred}, y_{true}) = \frac{1}{n}\sum_{i}\sum_{j} \max(0, 1 - (y_{pred}[y_{true}^i] - y_{pred}[j]))
L(ypred,ytrue)=n1∑i∑jmax(0,1−(ypred[ytruei]−ypred[j])))。多标签分类问题。标签需要是整数索引。其中每个样本
可以属于多个类别
。它有助于训练模型以将样本正确分类到其相关类别
,并在训练中惩罚不正确的分类
。
import torch
import torch.nn as nn
criterion = nn.MultiLabelMarginLoss()
y_pred = torch.tensor([[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0]], requires_grad=True)
y_true = torch.tensor([[0, 1, -1, -1], [1, 2, -1, -1]])
loss = criterion(y_pred, y_true)
print(loss.item())
CrossEntropyLoss 结合了 LogSoftmax
和 NLLLoss
,用于多分类问题,公式为 (
L
(
y
p
r
e
d
,
y
t
r
u
e
)
=
−
log
(
exp
(
y
p
r
e
d
[
y
t
r
u
e
]
)
∑
j
exp
(
y
p
r
e
d
[
j
]
)
)
L(y_{pred}, y_{true}) = -\log\left(\frac{\exp(y_{pred}[y_{true}])}{\sum_{j}\exp(y_{pred}[j])}\right)
L(ypred,ytrue)=−log(∑jexp(ypred[j])exp(ypred[ytrue])))。多分类问题,输入可以是原始值。输入不需要经过 Softmax
激活。
import torch
import torch.nn as nn
criterion = nn.CrossEntropyLoss()
y_pred = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], requires_grad=True)
y_true = torch.tensor([2, 1])
loss = criterion(y_pred, y_true)
print(loss.item())
# 定义不同类别的权重 假设每个类别的样本数量
class_sample_counts = [5, 3, 6, 4, 8, 5, 7]
total_samples = sum(class_sample_counts)
# 计算每个类别的权重
weights = [total_samples / (len(class_sample_counts) * count) for count in class_sample_counts]
weights = torch.tensor(weights, dtype=torch.float32)
# 定义损失函数
ce_loss_fn = nn.CrossEntropyLoss(weight=weights)
# 模拟模型输出和真实标签
batch_size = 4
num_classes = 7
y_pred = torch.randn(batch_size, num_classes, requires_grad=True)
y_true = torch.randint(0, num_classes, (batch_size,))
# 计算损失
loss = ce_loss_fn(y_pred, y_true)
print(f"Weighted CrossEntropyLoss: {loss.item()}")
Focal Loss 是对 CrossEntropyLoss 的改进,主要用于解决类别不平衡问题。在小样本多分类场景中,不同类别的样本数量可能存在较大差异,Focal Loss 通过引入一个调制因子来降低易分类样本的权重,从而更加关注难分类样本。公式为 ( F L ( p t ) = − ( 1 − p t ) γ log ( p t ) FL(p_t) = -(1 - p_t)^{\gamma} \log(p_t) FL(pt)=−(1−pt)γlog(pt)),其中 ( p t p_t pt) 是模型对真实类别的预测概率,( γ \gamma γ) 是调节因子。在小样本且类别不平衡的情况下,能够有效地提高模型对少数类别的识别能力,避免模型偏向于多数类,从而提升整体的分类性能。需要合理选择 ( γ \gamma γ) 参数,( γ \gamma γ) 值越大,对易分类样本的惩罚力度越大,模型会更加关注难分类样本。但如果 ( γ \gamma γ) 设置过大,可能会导致模型对噪声过于敏感。
import torch
import torch.nn as nn
import torch.nn.functional as F
class FocalLoss(nn.Module):
def __init__(self, alpha=1, gamma=2, reduction='mean'):
super(FocalLoss, self).__init__()
self.alpha = alpha
self.gamma = gamma
self.reduction = reduction
def forward(self, inputs, targets):
BCE_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction='none')
pt = torch.exp(-BCE_loss)
F_loss = self.alpha * (1-pt)**self.gamma * BCE_loss
if self.reduction == 'mean':
return torch.mean(F_loss)
elif self.reduction == 'sum':
return torch.sum(F_loss)
else:
return F_loss
# 使用示例
criterion = FocalLoss(alpha=0.25, gamma=2)
inputs = torch.randn(3, requires_grad=True)
targets = torch.empty(3).random_(2)
loss = criterion(inputs, targets)
loss.backward()
调制因子的作用: ( 1 − p t ) γ (1 - p_t)^{\gamma} (1−pt)γ 部分使得当 p t p_t pt 接近1时(即模型对该样本分类非常自信),该样本对总损失的贡献显著减小。这有助于模型更关注那些难以正确分类的样本。
平衡因子 α \alpha α: 主要用于调整正负样本之间的不平衡。例如,在目标检测中,背景样本(负样本)的数量远远超过前景对象(正样本)。通过设置 α < 0.5 α<0.5 α<0.5 可以增加正样本的重要性。
选择合适的 α \alpha α 和 γ : 这两个超参数的选择至关重要,它们需要根据具体任务进行调整。通常, γ 越大,对简单样本的惩罚越重;α 则需根据数据集中正负样本的比例来设定。
与标准交叉熵比较: 在类别极度不平衡的情况下,直接使用交叉熵损失可能会导致模型倾向于预测多数类。而 Focal Loss 通过降低易分类样本的权重,强制模型更多地学习边界样本的信息,从而提高整体性能。
from torch import nn
import torch
from torch.nn import functional as F
class focal_loss(nn.Module):
def __init__(self, alpha=None, gamma=2, num_classes = 3, size_average=True):
"""
focal_loss损失函数, -α(1-yi)**γ *ce_loss(xi,yi)
步骤详细的实现了 focal_loss损失函数.
:param alpha:类别权重.当α是列表时,为各类别权重,当α为常数时,类别权重为[α, 1-α, 1-α, ....],常用于 目标检测算法中抑制背景类 , retainnet中设置为0.25
:param gamma: 伽马γ,难易样本调节参数. retainnet中设置为2
:param num_classes: 类别数量
:param size_average: 损失计算方式,默认取均值
"""
super(focal_loss,self).__init__()
self.size_average = size_average
if alpha is None:
self.alpha = torch.ones(num_classes)
elif isinstance(alpha,list):
assert len(alpha)==num_classes
# α可以以list方式输入,size:[num_classes] 用于对不同类别精细地赋予权重
self.alpha = torch.Tensor(alpha)
else:
assert alpha<1 #如果α为一个常数,则降低第一类的影响,在目标检测中第一类为背景类
self.alpha = torch.zeros(num_classes)
self.alpha[0] += alpha
self.alpha[1:] += (1-alpha)
# α 最终为 [ α, 1-α, 1-α, 1-α, 1-α, ...] size:[num_classes]
self.gamma = gamma
print('Focal Loss:')
print(' Alpha = {}'.format(self.alpha))
print(' Gamma = {}'.format(self.gamma))
def forward(self, preds, labels):
"""
focal_loss损失计算
:param preds:预测类别. size:[B,N,C] or [B,C] 分别对应与检测与分类任务, C类别数
:param labels: 实际类别. size:[B,N] or [B]; B 批次, N检测框数,
"""
# assert preds.dim()==2 and labels.dim()==1
preds = preds.view(-1,preds.size(-1))
alpha = self.alpha.to(preds.device)
preds_logsoft = F.log_softmax(preds, dim=1) # log_softmax
preds_softmax = torch.exp(preds_logsoft) # softmax
preds_softmax = preds_softmax.gather(1,labels.view(-1,1))
# 这部分实现nll_loss ( crossempty = log_softmax + nll )
preds_logsoft = preds_logsoft.gather(1,labels.view(-1,1))
alpha = self.alpha.gather(0,labels.view(-1))
loss = -torch.mul(torch.pow((1-preds_softmax), self.gamma), preds_logsoft)
# torch.pow((1-preds_softmax), self.gamma) 为focal loss中 (1-pt)**γ
loss = torch.mul(alpha, loss.t())
if self.size_average:
loss = loss.mean()
else:
loss = loss.sum()
return loss
pred = torch.randn((3,5))
label = torch.tensor([2,3,4])
loss_fn = focal_loss(alpha=0.25, gamma=2, num_classes=5)
loss = loss_fn(pred, label)
loss_fn = focal_loss(alpha=[1,2,3,1,2], gamma=2, num_classes=5)
loss = loss_fn(pred, label)
Label Smoothing Loss是一种用于减轻分类问题中过拟合现象的损失函数,其基本思想是将真实标签的概率分布进行平滑化处理,使得模型在训练时不会过于自信地预测某种类别,从而提高其泛化能力。具体来说,Label Smoothing Loss将原本的one-hot标签 x 转化为如下的平滑标签: x ˉ i = { 1 − ϵ + ( 1 / K ) ϵ i f i = y ϵ / K , if o t h e r w i s e \bar x_i = \begin{cases} 1-ϵ+(1/K)ϵ &if~i= y \\ ϵ/K, & \text{if } otherwise \end{cases} xˉi={1−ϵ+(1/K)ϵϵ/K,if i=yif otherwise。其中 K 为类别数,ϵ 为平滑参数,y 为真实类别的标签。ϵ 通常设置在0.1左右。这意味着正确类别的概率被稍微降低,并且错误类别的概率被均匀地增加了一点点,平滑因子 ϵ,推荐从0.1开始尝试。如果发现模型性能下降,则应减小该值。根据交叉熵的定义,可以得到Label Smoothing Loss的公式: L L S = − ∑ i = 1 K x ˉ i l o g ( p i ) L_{LS}=−∑_{i=1}^K\bar x_ilog(p_i) LLS=−∑i=1Kxˉilog(pi)。其中 p i p_i pi 为模型预测的第 i 个类别的概率。Label Smoothing Loss可以应用于各种分类任务场景,但通常在存在过拟合风险的情况下效果更为显著。
import torch
import torch.nn as nn
import torch.nn.functional as F
class LabelSmoothingLoss(nn.Module):
def __init__(self, classes, smoothing=0.0, dim=-1):
super(LabelSmoothingLoss, self).__init__()
self.confidence = 1.0 - smoothing
self.smoothing = smoothing
self.cls = classes
self.dim = dim
def forward(self, pred, target):
pred = pred.log_softmax(dim=self.dim)
with torch.no_grad():
true_dist = torch.zeros_like(pred)
true_dist.fill_(self.smoothing / (self.cls - 1))
true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))
# 使用示例
criterion = LabelSmoothingLoss(classes=10, smoothing=0.1)
inputs = torch.randn(3, 10, requires_grad=True) # 假设有10个类别
targets = torch.tensor([1, 2, 3]) # 目标类别
loss = criterion(inputs, targets)
loss.backward()
MultiMarginLoss 用于多分类问题,其中每个样本只能属于一个类别
。基于最大间隔分类器的思想,公式为 (
L
(
y
p
r
e
d
,
y
t
r
u
e
)
=
1
n
∑
i
∑
j
≠
y
t
r
u
e
i
max
(
0
,
m
a
r
g
i
n
−
(
y
p
r
e
d
[
y
t
r
u
e
i
]
−
y
p
r
e
d
[
j
]
)
)
L(y_{pred}, y_{true}) = \frac{1}{n}\sum_{i}\sum_{j \neq y_{true}^i} \max(0, margin - (y_{pred}[y_{true}^i] - y_{pred}[j]))
L(ypred,ytrue)=n1∑i∑j=ytrueimax(0,margin−(ypred[ytruei]−ypred[j])))。多分类问题。需要设置 margin
参数。主要目标是鼓励模型将正确类别的得分与错误类别的得分之间的间隔(差距)最小化
。通常,这个损失函数被用于训练神经网络模型
,以确保正确的类别获得高的分数,而错误的类别获得低的分数。
import torch
import torch.nn as nn
criterion = nn.MultiMarginLoss()
y_pred = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], requires_grad=True)
y_true = torch.tensor([2, 1])
loss = criterion(y_pred, y_true)
print(loss.item())
MultiLabelSoftMarginLoss 用于多标签分类问题,基于 SoftMarginLoss
的扩展。多标签分类问题。输入可以是原始值。
import torch
import torch.nn as nn
criterion = nn.MultiLabelSoftMarginLoss()
y_pred = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], requires_grad=True)
y_true = torch.tensor([[0, 1, 1], [1, 0, 1]])
loss = criterion(y_pred, y_true)
print(loss.item())
MarginRankingLoss 用于排序问题,公式为 (
L
(
y
p
r
e
d
1
,
y
p
r
e
d
2
,
y
t
r
u
e
)
=
max
(
0
,
−
y
t
r
u
e
⋅
(
y
p
r
e
d
1
−
y
p
r
e
d
2
)
+
m
a
r
g
i
n
)
L(y_{pred}^1, y_{pred}^2, y_{true}) = \max(0, -y_{true} \cdot (y_{pred}^1 - y_{pred}^2) + margin)
L(ypred1,ypred2,ytrue)=max(0,−ytrue⋅(ypred1−ypred2)+margin))。排序问题,如推荐系统。需要设置 margin
参数。
import torch
import torch.nn as nn
criterion = nn.MarginRankingLoss()
y_pred1 = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y_pred2 = torch.tensor([4.0, 5.0, 6.0])
y_true = torch.tensor([-1, -1, -1])
loss = criterion(y_pred1, y_pred2, y_true)
print(loss.item())
CTCLoss 连接时序分类损失,用于处理序列数据的分类问题,如语音识别和手写识别。序列数据的分类问题。输入需要是对数概率,标签需要是整数索引。
import torch
import torch.nn as nn
criterion = nn.CTCLoss()
log_probs = torch.randn(50, 16, 20).log_softmax(2).detach().requires_grad_()
targets = torch.randint(1, 20, (16, 30), dtype=torch.long)
input_lengths = torch.full((16,), 50, dtype=torch.long)
target_lengths = torch.randint(10, 30, (16,), dtype=torch.long)
loss = criterion(log_probs, targets, input_lengths, target_lengths)
print(loss.item())
Dice Loss 广泛应用于图像分割任务中,特别是在医学图像处理领域。它的设计灵感来源于Dice系数(也称为Sørensen–Dice系数),这是一个衡量两个样本相似度的统计量。Dice Loss通过最小化预测结果和真实标签之间的差异来优化模型。Dice系数定义为: D = 2 ∣ X ∩ Y ∣ ∣ X ∣ + ∣ Y ∣ D=\frac{2∣X∩Y∣}{∣X∣+∣Y∣} D=∣X∣+∣Y∣2∣X∩Y∣。其中 X 是预测值,Y 是目标值。为了将其转化为损失函数,通常使用 1 减去 Dice 系数的形式: L D i c e = 1 − 2 ∑ X i Y i + ϵ ∑ X i 2 + ∑ Y i 2 + ϵ L_{Dice}=1-\frac{2\sum X_iY_i + ϵ}{\sum X_i^2+\sum Y_i^2 + ϵ} LDice=1−∑Xi2+∑Yi2+ϵ2∑XiYi+ϵ, 这里 ϵ 是一个小常数,用于防止分母为零的情况。Dice Loss对于类别极度不平衡的数据集特别有效,因为它直接考虑了正类像素与负类像素之间的重叠情况,而不是简单地计算每个像素的分类误差。在训练过程中,由于Dice系数的分母包含了预测值和真实值的平方和,这有助于保持梯度的稳定性,从而避免梯度消失的问题。smooth: 一个小数值,添加到分子和分母中以避免除以零的情况。一般设置为1.0即可,但根据具体应用场景可能需要调整。对于二分类问题,确保在计算Dice Loss之前对模型输出应用sigmoid激活函数;对于多分类问题,则应使用softmax函数。
import torch
import torch.nn as nn
class DiceLoss(nn.Module):
def __init__(self, smooth=1.0):
super(DiceLoss, self).__init__()
self.smooth = smooth
def forward(self, y_pred, y_true):
assert y_pred.size() == y_true.size()
y_pred = y_pred.contiguous().view(-1)
y_true = y_true.contiguous().view(-1)
intersection = (y_pred * y_true).sum()
dice_coefficient = (2. * intersection + self.smooth) / (y_pred.sum() + y_true.sum() + self.smooth)
return 1 - dice_coefficient
# 使用示例
criterion = DiceLoss(smooth=1.0)
inputs = torch.sigmoid(torch.randn(3, 5, requires_grad=True)) # 假设sigmoid输出
targets = torch.tensor([[0, 1, 0, 1, 0], [1, 0, 1, 0, 1], [0, 0, 0, 0, 1]], dtype=torch.float32)
loss = criterion(inputs, targets)
loss.backward()
Center Loss 旨在学习每个类别的中心,同时让样本特征靠近其所属类别的中心。损失由两部分组成,一部分是传统的分类损失(如交叉熵损失),另一部分是样本特征与所属类别中心的距离之和。公式为 ( L = L s o f t m a x + λ 1 2 ∑ i = 1 m ∥ x i − c y i ∥ 2 2 L = L_{softmax}+\lambda\frac{1}{2}\sum_{i = 1}^{m}\left\lVert x_{i}-c_{y_{i}}\right\rVert_{2}^{2} L=Lsoftmax+λ21∑i=1m∥xi−cyi∥22),其中 ( L s o f t m a x L_{softmax} Lsoftmax) 是 Softmax 损失,( x i x_{i} xi) 是样本特征,( c y i c_{y_{i}} cyi) 是样本 i 所属类别的中心,( λ \lambda λ) 是平衡系数。在小样本多分类中,它可以帮助模型学习到更具判别性的特征表示,减少类内差异,增大类间差异。
import torch
import torch.nn as nn
import torch.nn.functional as F
class CenterLoss(nn.Module):
def __init__(self, num_classes, feat_dim):
super(CenterLoss, self).__init__()
self.num_classes = num_classes
self.feat_dim = feat_dim
self.centers = nn.Parameter(torch.randn(num_classes, feat_dim))
def forward(self, x, labels):
batch_size = x.size(0)
distmat = torch.pow(x, 2).sum(dim=1, keepdim=True).expand(batch_size, self.num_classes) + torch.pow(self.centers, 2).sum(dim=1, keepdim=True).expand(self.num_classes, batch_size).t()
distmat.addmm_(1, -2, x, self.centers.t())
classes = torch.arange(self.num_classes).long()
labels = labels.unsqueeze(1).expand(batch_size, self.num_classes)
mask = labels.eq(classes.expand(batch_size, self.num_classes))
dist = []
for i in range(batch_size):
value = distmat[i][mask[i]]
value = value.clamp(min=1e-12, max=1e+12)
dist.append(value)
dist = torch.cat(dist)
loss = dist.mean()
return loss
# 模拟数据
x = torch.randn(3, 128)
labels = torch.tensor([1, 3, 0])
center_loss = CenterLoss(num_classes=7, feat_dim=128)
loss = center_loss(x, labels)
print(f"Center Loss: {loss.item()}")
lambda 是平衡分类损失(如 Softmax 损失)和中心损失的系数。较大的 lambda 值会使模型更关注样本特征与类别中心的距离,有助于减少类内差异,但可能会导致分类性能下降;较小的 lambda 值则更侧重于分类损失,中心损失的作用相对较小。样本数量有限时候,类内特征可能不够丰富,初始时可以将 lambda
设置为一个较小的值,如 0.001 - 0.01。然后根据训练过程中模型的表现进行调整,如果发现类内差异较大,可以适当增大 lambda
值;如果分类性能下降明显,则减小 lambda
值。
feat_dim
表示特征向量的维度。合适的特征维度能够更好地表示样本的特征信息,提高模型的分类性能。对于简单的数据,特征维度可以设置得较小,如 64 - 128;对于复杂的数据,如包含多种纹理和形状信息的图像数据,特征维度可以适当增大,如 256 - 512。同时,可以通过实验不同的特征维度,选择在验证集上性能最佳的设置。
ArcFace Loss 通过在特征空间中对类间角度进行显式约束,直接最大化分类边界。它在 Softmax 损失的基础上,将特征向量和权重向量之间的余弦相似度转换为角度,并在角度上加上一个固定的角度间隔 m,然后再转换回余弦值进行分类。在小样本情况下,能够有效提高模型的特征区分能力,使模型学习到更具判别性的特征。
import torch
import torch.nn as nn
import torch.nn.functional as F
class ArcFaceLoss(nn.Module):
def __init__(self, in_features, out_features, s=30.0, m=0.50):
super(ArcFaceLoss, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.s = s
self.m = m
self.fc = nn.Linear(in_features, out_features, bias=False)
def forward(self, x, labels):
cosine = F.linear(F.normalize(x), F.normalize(self.fc.weight))
sine = torch.sqrt((1.0 - torch.pow(cosine, 2)).clamp(0, 1))
phi = cosine * torch.cos(self.m) - sine * torch.sin(self.m)
phi = phi.type_as(cosine)
one_hot = torch.zeros(cosine.size(), device=x.device)
one_hot.scatter_(1, labels.view(-1, 1).long(), 1)
output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
output *= self.s
loss = F.cross_entropy(output, labels)
return loss
# 模拟数据
x = torch.randn(3, 128)
labels = torch.tensor([1, 3, 0])
arcface_loss = ArcFaceLoss(128, 7)
loss = arcface_loss(x, labels)
print(f"ArcFace Loss: {loss.item()}")
可以将不同的损失函数进行组合,以充分利用它们的优点。例如,将 CrossEntropyLoss 和 TripletMarginLoss 结合起来,既可以让模型学习到类别之间的概率分布,又可以学习到样本之间的相似性。
import torch
import torch.nn as nn
# 定义模型输出和真实标签
y_pred = torch.randn(3, 5, requires_grad=True)
y_true = torch.tensor([1, 3, 0])
# 定义锚点、正样本和负样本
anchor = torch.randn(3, 10, requires_grad=True)
positive = torch.randn(3, 10)
negative = torch.randn(3, 10)
# 定义损失函数
ce_loss_fn = nn.CrossEntropyLoss()
triplet_loss_fn = nn.TripletMarginLoss()
# 计算损失
ce_loss = ce_loss_fn(y_pred, y_true)
triplet_loss = triplet_loss_fn(anchor, positive, negative)
# 组合损失
combined_loss = ce_loss + triplet_loss
print(f"Combined Loss: {combined_loss.item()}")
Center Loss 与 Cross - Entropy Loss 组合 Cross - Entropy Loss 用于确保模型正确分类样本,而 Center Loss 用于减少类内差异,增大类间差异。两者结合可以充分利用各自的优势,提高模型的分类性能。
import torch
import torch.nn as nn
import torch.nn.functional as F
class CenterLoss(nn.Module):
def __init__(self, num_classes, feat_dim):
super(CenterLoss, self).__init__()
self.num_classes = num_classes
self.feat_dim = feat_dim
self.centers = nn.Parameter(torch.randn(num_classes, feat_dim))
def forward(self, x, labels):
batch_size = x.size(0)
distmat = torch.pow(x, 2).sum(dim=1, keepdim=True).expand(batch_size, self.num_classes) + torch.pow(self.centers, 2).sum(dim=1, keepdim=True).expand(self.num_classes, batch_size).t()
distmat.addmm_(1, -2, x, self.centers.t())
classes = torch.arange(self.num_classes).long()
labels = labels.unsqueeze(1).expand(batch_size, self.num_classes)
mask = labels.eq(classes.expand(batch_size, self.num_classes))
dist = []
for i in range(batch_size):
value = distmat[i][mask[i]]
value = value.clamp(min=1e-12, max=1e+12)
dist.append(value)
dist = torch.cat(dist)
loss = dist.mean()
return loss
# 模拟数据
batch_size = 4
num_classes = 7
feat_dim = 128
x = torch.randn(batch_size, feat_dim) # 特征向量
y_pred = torch.randn(batch_size, num_classes) # 模型预测输出
y_true = torch.randint(0, num_classes, (batch_size,)) # 真实标签
# 定义损失函数
cross_entropy_loss = nn.CrossEntropyLoss()
center_loss = CenterLoss(num_classes, feat_dim)
lambda_center = 0.01 # Center Loss 的权重
# 计算损失
ce_loss = cross_entropy_loss(y_pred, y_true)
center_loss_value = center_loss(x, y_true)
combined_loss = ce_loss + lambda_center * center_loss_value
print(f"Combined Loss: {combined_loss.item()}")
Large Margin Softmax Loss 与 Focal Loss 组合:Large Margin Softmax Loss 可以增大类间区分度,Focal Loss 可以解决小样本场景下的类别不平衡问题。两者结合能够在提高类间区分度的同时,关注少数类样本的分类。
import torch
import torch.nn as nn
import torch.nn.functional as F
# 自定义 Large Margin Softmax Loss 类
class LargeMarginSoftmaxLoss(nn.Module):
def __init__(self, in_features, out_features, margin=2):
super(LargeMarginSoftmaxLoss, self).__init__()
self.fc = nn.Linear(in_features, out_features, bias=False)
self.margin = margin
def forward(self, x, labels):
w = self.fc.weight
x_norm = F.normalize(x, p=2, dim=1)
w_norm = F.normalize(w, p=2, dim=1)
cos_theta = F.linear(x_norm, w_norm)
theta = torch.acos(cos_theta.clamp(-1 + 1e-7, 1 - 1e-7))
target_logit = torch.cos(self.margin * theta)
one_hot = torch.zeros_like(cos_theta)
one_hot.scatter_(1, labels.view(-1, 1).long(), 1)
output = one_hot * target_logit + (1 - one_hot) * cos_theta
log_prob = F.log_softmax(output, dim=1)
loss = -torch.mean(torch.sum(log_prob * one_hot, dim=1))
return loss
# 自定义 Focal Loss 类
class FocalLoss(nn.Module):
def __init__(self, alpha=1, gamma=2, reduction='mean'):
super(FocalLoss, self).__init__()
self.alpha = alpha
self.gamma = gamma
self.reduction = reduction
def forward(self, inputs, targets):
ce_loss = F.cross_entropy(inputs, targets, reduction='none')
pt = torch.exp(-ce_loss)
focal_loss = self.alpha * (1 - pt) ** self.gamma * ce_loss
if self.reduction == 'mean':
return focal_loss.mean()
elif self.reduction == 'sum':
return focal_loss.sum()
else:
return focal_loss
# 模拟数据
batch_size = 4
in_features = 128
num_classes = 7
x = torch.randn(batch_size, in_features) # 特征向量
y_pred = torch.randn(batch_size, num_classes) # 模型预测输出
y_true = torch.randint(0, num_classes, (batch_size,)) # 真实标签
# 定义损失函数
lms_loss = LargeMarginSoftmaxLoss(in_features, num_classes)
focal_loss = FocalLoss()
lambda_focal = 0.5 # Focal Loss 的权重
# 计算损失
lms_loss_value = lms_loss(x, y_true)
focal_loss_value = focal_loss(y_pred, y_true)
combined_loss = lms_loss_value + lambda_focal * focal_loss_value
print(f"Combined Loss: {combined_loss.item()}")
根据样本的特点和模型的训练情况,动态调整损失函数的参数。例如,在训练初期,使用较大的 margin 值来增强模型的区分能力;在训练后期,逐渐减小 margin 值,使模型更加稳定。
import torch
import torch.nn as nn
# 模拟训练过程
num_epochs = 10
margin_start = 1.0
margin_end = 0.1
y_pred = torch.randn(3, 5, requires_grad=True)
y_true = torch.tensor([1, 3, 0])
for epoch in range(num_epochs):
# 动态调整 margin
margin = margin_start - (margin_start - margin_end) * (epoch / num_epochs)
loss_fn = nn.MultiMarginLoss(margin=margin)
loss = loss_fn(y_pred, y_true)
print(f"Epoch {epoch+1}, Margin: {margin}, Loss: {loss.item()}")