学习率衰减有多种方式,本次采用optim.lr_scheduler.ReduceLROnPlateau,这种方式代表在发现loss不再降低或者acc不再提高之后,降低学习率。
model = GRU().to(device)
criterion = nn.CrossEntropyLoss().to(device)
optimizer = optim.AdamW(model.parameters(), lr=5e-4,weight_decay=1e-4) # weight_decay=1e-4 weight_decay 就是 L2 正则化系数 , betas=(0.9, 0.888)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=20, verbose=True)
在训练过程中加入如下代码
current_lr = scheduler.optimizer.param_groups[0]['lr']
print(f'Current Learning Rate: {current_lr}')
scheduler.step(avg_val_loss)
参考:
https://blog.csdn.net/lihuanyu520/article/details/132161165
https://blog.csdn.net/emperinter/article/details/108917935