学AI还能赢奖品?每天30分钟,25天打通AI任督二脉 (qq.com)
ShuffleNet图像分类
当前案例不支持在GPU设备上静态图模式运行,其他模式运行皆支持。
ShuffleNet网络介绍
ShuffleNetV1是旷视科技提出的一种计算高效的CNN模型,和MobileNet, SqueezeNet等一样主要应用在移动端,所以模型的设计目标就是利用有限的计算资源来达到最好的模型精度。ShuffleNetV1的设计核心是引入了两种操作:Pointwise Group Convolution和Channel Shuffle,这在保持精度的同时大大降低了模型的计算量。因此,ShuffleNetV1和MobileNet类似,都是通过设计更高效的网络结构来实现模型的压缩和加速。
了解ShuffleNet更多详细内容,详见论文ShuffleNet。
如下图所示,ShuffleNet在保持不低的准确率的前提下,将参数量几乎降低到了最小,因此其运算速度较快,单位参数量对模型准确率的贡献非常高。
图片来源:Bianco S, Cadene R, Celona L, et al. Benchmark analysis of representative deep neural network architectures[J]. IEEE access, 2018, 6: 64270-64277.
ShuffleNet通过Pointwise Group Convolution和Channel Shuffle操作,降低模型的计算量,同时保持精度。
模型架构
ShuffleNet最显著的特点在于对不同通道进行重排来解决Group Convolution带来的弊端。通过对ResNet的Bottleneck单元进行改进,在较小的计算量的情况下达到了较高的准确率。
Pointwise Group Convolution
Group Convolution(分组卷积)原理如下图所示,相比于普通的卷积操作,分组卷积的情况下,每一组的卷积核大小为in_channels/g*k*k,一共有g组,所有组共有(in_channels/g*k*k)*out_channels个参数,是正常卷积参数的1/g。分组卷积中,每个卷积核只处理输入特征图的一部分通道,其优点在于参数量会有所降低,但输出通道数仍等于卷积核的数量。
图片来源:Huang G, Liu S, Van der Maaten L, et al. Condensenet: An efficient densenet using learned group convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2752-2761.
Depthwise Convolution(深度可分离卷积)将组数g分为和输入通道相等的in_channels
,然后对每一个in_channels
做卷积操作,每个卷积核只处理一个通道,记卷积核大小为1*k*k,则卷积核参数量为:in_channels*k*k,得到的feature maps通道数与输入通道数相等;
Pointwise Group Convolution(逐点分组卷积)在分组卷积的基础上,令每一组的卷积核大小为 1×1,卷积核参数量为(in_channels/g*1*1)*out_channels。
%%capture captured_output
# 实验环境已经预装了mindspore==2.2.14,如需更换mindspore版本,可更改下面mindspore的版本号
!pip uninstall mindspore -y
!pip install -i https://pypi.mirrors.ustc.edu.cn/simple mindspore==2.2.14
# 查看当前 mindspore 版本
!pip show mindspore
Name: mindspore Version: 2.2.14 Summary: MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios. Home-page: https://www.mindspore.cn Author: The MindSpore Authors Author-email: contact@mindspore.cn License: Apache 2.0 Location: /home/nginx/miniconda/envs/jupyter/lib/python3.9/site-packages Requires: asttokens, astunparse, numpy, packaging, pillow, protobuf, psutil, scipy Required-by:
from mindspore import nn
import mindspore.ops as ops
from mindspore import Tensor
class GroupConv(nn.Cell):
def __init__(self, in_channels, out_channels, kernel_size,
stride, pad_mode="pad", pad=0, groups=1, has_bias=False):
super(GroupConv, self).__init__()
self.groups = groups
self.convs = nn.CellList()
for _ in range(groups):
self.convs.append(nn.Conv2d(in_channels // groups, out_channels // groups,
kernel_size=kernel_size, stride=stride, has_bias=has_bias,
padding=pad, pad_mode=pad_mode, group=1, weight_init='xavier_uniform'))
def construct(self, x):
features = ops.split(x, split_size_or_sections=int(len(x[0]) // self.groups), axis=1)
outputs = ()
for i in range(self.groups):
outputs = outputs + (self.convs[i](features[i].astype("float32")),)
out = ops.cat(outputs, axis=1)
return out
Pointwise Group Convolution结合点卷积(Pointwise Convolution)和分组卷积(Group Convolution)。将输入特征图分成多个组,在每个组内应用1x1的点卷积,对每个组的点卷积结果进行合并。通过GroupConv 类实现Pointwise Group Convolution。不同组别通道之间有信息隔离问题。
Channel Shuffle
Group Convolution的弊端在于不同组别的通道无法进行信息交流,堆积GConv层后一个问题是不同组之间的特征图是不通信的,这就好像分成了g个互不相干的道路,每一个人各走各的,这可能会降低网络的特征提取能力。这也是Xception,MobileNet等网络采用密集的1x1卷积(Dense Pointwise Convolution)的原因。
为了解决不同组别通道“近亲繁殖”的问题,ShuffleNet优化了大量密集的1x1卷积(在使用的情况下计算量占用率达到了惊人的93.4%),引入Channel Shuffle机制(通道重排)。这项操作直观上表现为将不同分组通道均匀分散重组,使网络在下一层能处理不同组别通道的信息。
如下图所示,对于g组,每组有n个通道的特征图,首先reshape成g行n列的矩阵,再将矩阵转置成n行g列,最后进行flatten操作,得到新的排列。这些操作都是可微分可导的且计算简单,在解决了信息交互的同时符合了ShuffleNet轻量级网络设计的轻量特征。
为了阅读方便,将Channel Shuffle的代码实现放在下方ShuffleNet模块的代码中。
Channel Shuffle通道重排,解决不同组别通道之间的信息隔离问题。
ShuffleNet模块
如下图所示,ShuffleNet对ResNet中的Bottleneck结构进行由(a)到(b), (c)的更改:
-
将开始和最后的1×1卷积模块(降维、升维)改成Point Wise Group Convolution;
-
为了进行不同通道的信息交流,再降维之后进行Channel Shuffle;
-
降采样模块中,3×3Depth Wise Convolution的步长设置为2,长宽降为原来的一般,因此shortcut中采用步长为2的3×3平均池化,并把相加改成拼接。
class ShuffleV1Block(nn.Cell):
def __init__(self, inp, oup, group, first_group, mid_channels, ksize, stride):
super(ShuffleV1Block, self).__init__()
self.stride = stride
pad = ksize // 2
self.group = group
if stride == 2:
outputs = oup - inp
else:
outputs = oup
self.relu = nn.ReLU()
branch_main_1 = [
GroupConv(in_channels=inp, out_channels=mid_channels,
kernel_size=1, stride=1, pad_mode="pad", pad=0,
groups=1 if first_group else group),
nn.BatchNorm2d(mid_channels),
nn.ReLU(),
]
branch_main_2 = [
nn.Conv2d(mid_channels, mid_channels, kernel_size=ksize, stride=stride,
pad_mode='pad', padding=pad, group=mid_channels,
weight_init='xavier_uniform', has_bias=False),
nn.BatchNorm2d(mid_channels),
GroupConv(in_channels=mid_channels, out_channels=outputs,
kernel_size=1, stride=1, pad_mode="pad", pad=0,
groups=group),
nn.BatchNorm2d(outputs),
]
self.branch_main_1 = nn.SequentialCell(branch_main_1)
self.branch_main_2 = nn.SequentialCell(branch_main_2)
if stride == 2:
self.branch_proj = nn.AvgPool2d(kernel_size=3, stride=2, pad_mode='same')
def construct(self, old_x):
left = old_x
right = old_x
out = old_x
right = self.branch_main_1(right)
if self.group > 1:
right = self.channel_shuffle(right)
right = self.branch_main_2(right)
if self.stride == 1:
out = self.relu(left + right)
elif self.stride == 2:
left = self.branch_proj(left)
out = ops.cat((left, right), 1)
out = self.relu(out)
return out
def channel_shuffle(self, x):
batchsize, num_channels, height, width = ops.shape(x)
group_channels = num_channels // self.group
x = ops.reshape(x, (batchsize, group_channels, self.group, height, width))
x = ops.transpose(x, (0, 2, 1, 3, 4))
x = ops.reshape(x, (batchsize, num_channels, height, width))
return x
ShuffleV1Block
类实现ShuffleNet V1的基本构建块。channel_shuffle
方法实现了通道重排。在ShuffleV1Block
类中,left
和right
分别代表了ShuffleNet V1架构中的两个分支的输出,是基于残差连接的Bottleneck结构进行的修改。
构建ShuffleNet网络
ShuffleNet网络结构如下图所示,以输入图像224×224,组数3(g = 3)为例,首先通过数量24,卷积核大小为3×3,stride为2的卷积层,输出特征图大小为112×112,channel为24;然后通过stride为2的最大池化层,输出特征图大小为56×56,channel数不变;再堆叠3个ShuffleNet模块(Stage2, Stage3, Stage4),三个模块分别重复4次、8次、4次,其中每个模块开始先经过一次下采样模块(上图(c)),使特征图长宽减半,channel翻倍(Stage2的下采样模块除外,将channel数从24变为240);随后经过全局平均池化,输出大小为1×1×960,再经过全连接层和softmax,得到分类概率。
class ShuffleNetV1(nn.Cell):
def __init__(self, n_class=1000, model_size='2.0x', group=3):
super(ShuffleNetV1, self).__init__()
print('model size is ', model_size)
self.stage_repeats = [4, 8, 4]
self.model_size = model_size
if group == 3:
if model_size == '0.5x':
self.stage_out_channels = [-1, 12, 120, 240, 480]
elif model_size == '1.0x':
self.stage_out_channels = [-1, 24, 240, 480, 960]
elif model_size == '1.5x':
self.stage_out_channels = [-1, 24, 360, 720, 1440]
elif model_size == '2.0x':
self.stage_out_channels = [-1, 48, 480, 960, 1920]
else:
raise NotImplementedError
elif group == 8:
if model_size == '0.5x':
self.stage_out_channels = [-1, 16, 192, 384, 768]
elif model_size == '1.0x':
self.stage_out_channels = [-1, 24, 384, 768, 1536]
elif model_size == '1.5x':
self.stage_out_channels = [-1, 24, 576, 1152, 2304]
elif model_size == '2.0x':
self.stage_out_channels = [-1, 48, 768, 1536, 3072]
else:
raise NotImplementedError
input_channel = self.stage_out_channels[1]
self.first_conv = nn.SequentialCell(
nn.Conv2d(3, input_channel, 3, 2, 'pad', 1, weight_init='xavier_uniform', has_bias=False),
nn.BatchNorm2d(input_channel),
nn.ReLU(),
)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')
features = []
for idxstage in range(len(self.stage_repeats)):
numrepeat = self.stage_repeats[idxstage]
output_channel = self.stage_out_channels[idxstage + 2]
for i in range(numrepeat):
stride = 2 if i == 0 else 1
first_group = idxstage == 0 and i == 0
features.append(ShuffleV1Block(input_channel, output_channel,
group=group, first_group=first_group,
mid_channels=output_channel // 4, ksize=3, stride=stride))
input_channel = output_channel
self.features = nn.SequentialCell(features)
self.globalpool = nn.AvgPool2d(7)
self.classifier = nn.Dense(self.stage_out_channels[-1], n_class)
def construct(self, x):
x = self.first_conv(x)
x = self.maxpool(x)
x = self.features(x)
x = self.globalpool(x)
x = ops.reshape(x, (-1, self.stage_out_channels[-1]))
x = self.classifier(x)
return x
ShuffleNetV1类构建完整的ShuffleNet网络,包含了多个阶段(stage)的ShuffleV1Block。根据model_size和group,初始化各个阶段的输出通道数。创建了第一个卷积模块first_conv,一个最大池化层maxpool。循环创建多个阶段,每个阶段中包含多个ShuffleV1Block,stride=2特征图大小从输入到输出依次减半。全局平均池化globalpool
、调整特征图形状reshape、全连接层classifier进行分类预测。
模型训练和评估
采用CIFAR-10数据集对ShuffleNet进行预训练。
训练集准备与加载
采用CIFAR-10数据集对ShuffleNet进行预训练。CIFAR-10共有60000张32*32的彩色图像,均匀地分为10个类别,其中50000张图片作为训练集,10000图片作为测试集。如下示例使用mindspore.dataset.Cifar10Dataset
接口下载并加载CIFAR-10的训练集。目前仅支持二进制版本(CIFAR-10 binary version)。
from download import download
url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz"
download(url, "./dataset", kind="tar.gz", replace=True)
Creating data folder... Downloading data from https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz (162.2 MB) file_sizes: 100%|█████████████████████████████| 170M/170M [00:00<00:00, 183MB/s] Extracting tar.gz file... Successfully downloaded / unzipped to ./dataset[6]:
'./dataset'
import mindspore as ms
from mindspore.dataset import Cifar10Dataset
from mindspore.dataset import vision, transforms
def get_dataset(train_dataset_path, batch_size, usage):
image_trans = []
if usage == "train":
image_trans = [
vision.RandomCrop((32, 32), (4, 4, 4, 4)),
vision.RandomHorizontalFlip(prob=0.5),
vision.Resize((224, 224)),
vision.Rescale(1.0 / 255.0, 0.0),
vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
vision.HWC2CHW()
]
elif usage == "test":
image_trans = [
vision.Resize((224, 224)),
vision.Rescale(1.0 / 255.0, 0.0),
vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
vision.HWC2CHW()
]
label_trans = transforms.TypeCast(ms.int32)
dataset = Cifar10Dataset(train_dataset_path, usage=usage, shuffle=True)
dataset = dataset.map(image_trans, 'image')
dataset = dataset.map(label_trans, 'label')
dataset = dataset.batch(batch_size, drop_remainder=True)
return dataset
dataset = get_dataset("./dataset/cifar-10-batches-bin", 128, "train")
batches_per_epoch = dataset.get_dataset_size()
通过 download
函数下载 CIFAR-10 数据集。get_dataset
函数对数据进行预处理,包括随机裁剪、翻转、调整大小、归一化等,并将数据分批。
模型训练
本节用随机初始化的参数做预训练。首先调用ShuffleNetV1
定义网络,参数量选择"2.0x"
,并定义损失函数为交叉熵损失,学习率经过4轮的warmup
后采用余弦退火,优化器采用Momentum
。最后用train.model
中的Model
接口将模型、损失函数、优化器封装在model
中,并用model.train()
对网络进行训练。将ModelCheckpoint
、CheckpointConfig
、TimeMonitor
和LossMonitor
传入回调函数中,将会打印训练的轮数、损失和时间,并将ckpt文件保存在当前目录下。
import time
import mindspore
import numpy as np
from mindspore import Tensor, nn
from mindspore.train import ModelCheckpoint, CheckpointConfig, TimeMonitor, LossMonitor, Model, Top1CategoricalAccuracy, Top5CategoricalAccuracy
def train():
mindspore.set_context(mode=mindspore.PYNATIVE_MODE, device_target="Ascend")
net = ShuffleNetV1(model_size="2.0x", n_class=10)
loss = nn.CrossEntropyLoss(weight=None, reduction='mean', label_smoothing=0.1)
min_lr = 0.0005
base_lr = 0.05
lr_scheduler = mindspore.nn.cosine_decay_lr(min_lr,
base_lr,
batches_per_epoch*250,
batches_per_epoch,
decay_epoch=250)
lr = Tensor(lr_scheduler[-1])
optimizer = nn.Momentum(params=net.trainable_params(), learning_rate=lr, momentum=0.9, weight_decay=0.00004, loss_scale=1024)
loss_scale_manager = ms.amp.FixedLossScaleManager(1024, drop_overflow_update=False)
model = Model(net, loss_fn=loss, optimizer=optimizer, amp_level="O3", loss_scale_manager=loss_scale_manager)
callback = [TimeMonitor(), LossMonitor()]
save_ckpt_path = "./"
config_ckpt = CheckpointConfig(save_checkpoint_steps=batches_per_epoch, keep_checkpoint_max=5)
ckpt_callback = ModelCheckpoint("shufflenetv1", directory=save_ckpt_path, config=config_ckpt)
callback += [ckpt_callback]
print("============== Starting Training ==============")
start_time = time.time()
# 由于时间原因,epoch = 5,可根据需求进行调整
model.train(5, dataset, callbacks=callback)
use_time = time.time() - start_time
hour = str(int(use_time // 60 // 60))
minute = str(int(use_time // 60 % 60))
second = str(int(use_time % 60))
print("total time:" + hour + "h " + minute + "m " + second + "s")
print("============== Train Success ==============")
if __name__ == '__main__':
train()
model size is 2.0x ============== Starting Training ============== epoch: 1 step: 1, loss is 2.487356185913086 epoch: 1 step: 2, loss is 2.4824328422546387 epoch: 1 step: 3, loss is 2.360534191131592 epoch: 1 step: 4, loss is 2.3881638050079346 epoch: 1 step: 5, loss is 2.4093079566955566 epoch: 1 step: 6, loss is 2.4161176681518555 epoch: 1 step: 7, loss is 2.307513952255249 epoch: 1 step: 8, loss is 2.3627288341522217 epoch: 1 step: 9, loss is 2.3538804054260254 epoch: 1 step: 10, loss is 2.3459255695343018 epoch: 1 step: 11, loss is 2.2624566555023193 epoch: 1 step: 12, loss is 2.315800666809082 epoch: 1 step: 13, loss is 2.308826446533203 epoch: 1 step: 14, loss is 2.276533842086792 epoch: 1 step: 15, loss is 2.3070459365844727 epoch: 1 step: 16, loss is 2.3226258754730225 epoch: 1 step: 17, loss is 2.3349223136901855 epoch: 1 step: 18, loss is 2.2568447589874268 epoch: 1 step: 19, loss is 2.300696849822998 epoch: 1 step: 20, loss is 2.234652519226074 epoch: 1 step: 21, loss is 2.2882838249206543 epoch: 1 step: 22, loss is 2.2472236156463623 epoch: 1 step: 23, loss is 2.2128169536590576 epoch: 1 step: 24, loss is 2.2238047122955322 epoch: 1 step: 25, loss is 2.188947916030884 epoch: 1 step: 26, loss is 2.12522292137146 epoch: 1 step: 27, loss is 2.1997528076171875 epoch: 1 step: 28, loss is 2.2016754150390625 epoch: 1 step: 29, loss is 2.267056465148926 epoch: 1 step: 30, loss is 2.220615863800049 epoch: 1 step: 31, loss is 2.163125514984131 epoch: 1 step: 32, loss is 2.1402995586395264 epoch: 1 step: 33, loss is 2.201935052871704 epoch: 1 step: 34, loss is 2.271354913711548 epoch: 1 step: 35, loss is 2.211737632751465 epoch: 1 step: 36, loss is 2.1788642406463623 epoch: 1 step: 37, loss is 2.142369031906128 epoch: 1 step: 38, loss is 2.150986433029175 epoch: 1 step: 39, loss is 2.1608712673187256 epoch: 1 step: 40, loss is 2.1094672679901123 epoch: 1 step: 41, loss is 2.1704046726226807 epoch: 1 step: 42, loss is 2.133333683013916 epoch: 1 step: 43, loss is 2.107698440551758 epoch: 1 step: 44, loss is 2.1666100025177 epoch: 1 step: 45, loss is 2.092921495437622 epoch: 1 step: 46, loss is 2.152892589569092 epoch: 1 step: 47, loss is 2.2185187339782715 epoch: 1 step: 48, loss is 2.108494758605957 epoch: 1 step: 49, loss is 2.124622344970703 epoch: 1 step: 50, loss is 2.0244524478912354 epoch: 1 step: 51, loss is 2.092057704925537 epoch: 1 step: 52, loss is 2.07167387008667 epoch: 1 step: 53, loss is 2.0988404750823975 epoch: 1 step: 54, loss is 2.0495665073394775 epoch: 1 step: 55, loss is 2.125042676925659 epoch: 1 step: 56, loss is 2.119354248046875 epoch: 1 step: 57, loss is 2.048170804977417 epoch: 1 step: 58, loss is 2.1628291606903076 epoch: 1 step: 59, loss is 2.062399387359619 epoch: 1 step: 60, loss is 2.1404664516448975 epoch: 1 step: 61, loss is 2.083807945251465 epoch: 1 step: 62, loss is 2.0124189853668213 epoch: 1 step: 63, loss is 2.0625486373901367 epoch: 1 step: 64, loss is 2.1296193599700928 epoch: 1 step: 65, loss is 2.095654010772705 epoch: 1 step: 66, loss is 2.019275426864624 epoch: 1 step: 67, loss is 1.9981178045272827 epoch: 1 step: 68, loss is 2.1032302379608154 epoch: 1 step: 69, loss is 2.0070807933807373 epoch: 1 step: 70, loss is 1.982177734375 epoch: 1 step: 71, loss is 2.008201837539673 epoch: 1 step: 72, loss is 2.148066520690918 epoch: 1 step: 73, loss is 2.047478675842285 epoch: 1 step: 74, loss is 2.01863956451416 epoch: 1 step: 76, loss is 2.1667799949645996 epoch: 1 step: 77, loss is 2.0567209720611572 epoch: 1 step: 78, loss is 2.05881929397583 epoch: 1 step: 79, loss is 2.0320184230804443 epoch: 1 step: 80, loss is 2.0250511169433594 epoch: 1 step: 81, loss is 2.0239150524139404 epoch: 1 step: 82, loss is 2.064993143081665 epoch: 1 step: 83, loss is 2.041712522506714 epoch: 1 step: 84, loss is 2.024174690246582 epoch: 1 step: 85, loss is 1.9830849170684814 epoch: 1 step: 86, loss is 2.028733730316162 epoch: 1 step: 87, loss is 1.9664912223815918 epoch: 1 step: 88, loss is 2.0286335945129395 epoch: 1 step: 89, loss is 2.0676581859588623 epoch: 1 step: 90, loss is 2.057992696762085 epoch: 1 step: 91, loss is 2.0323309898376465 epoch: 1 step: 92, loss is 1.984572172164917 epoch: 1 step: 93, loss is 2.0055670738220215 epoch: 1 step: 94, loss is 2.0943844318389893 epoch: 1 step: 95, loss is 1.894536018371582 epoch: 1 step: 96, loss is 1.9870450496673584 epoch: 1 step: 97, loss is 1.9714359045028687 epoch: 1 step: 98, loss is 2.0173873901367188 epoch: 1 step: 99, loss is 2.0027568340301514 epoch: 1 step: 100, loss is 2.0405211448669434 epoch: 1 step: 101, loss is 2.035789728164673 epoch: 1 step: 102, loss is 2.1301562786102295 epoch: 1 step: 103, loss is 1.987611174583435 epoch: 1 step: 104, loss is 2.0551233291625977 epoch: 1 step: 105, loss is 1.9624370336532593 epoch: 1 step: 106, loss is 1.9536328315734863 epoch: 1 step: 107, loss is 2.0937461853027344 epoch: 1 step: 108, loss is 2.0652730464935303 epoch: 1 step: 109, loss is 1.9838111400604248 epoch: 1 step: 110, loss is 2.10089111328125 epoch: 1 step: 111, loss is 2.036222457885742 epoch: 1 step: 112, loss is 1.9383563995361328 epoch: 1 step: 113, loss is 2.059077739715576 epoch: 1 step: 114, loss is 2.040703535079956 epoch: 1 step: 115, loss is 1.9993565082550049 epoch: 1 step: 116, loss is 2.050079584121704 epoch: 1 step: 117, loss is 1.9684908390045166 epoch: 1 step: 118, loss is 1.9950356483459473 epoch: 1 step: 119, loss is 2.0948896408081055 epoch: 1 step: 120, loss is 1.9908406734466553 epoch: 1 step: 121, loss is 2.0056936740875244 epoch: 1 step: 122, loss is 1.9615106582641602 epoch: 1 step: 123, loss is 2.0008866786956787 epoch: 1 step: 124, loss is 2.0557472705841064 epoch: 1 step: 125, loss is 1.9648888111114502 epoch: 1 step: 126, loss is 2.043461799621582 epoch: 1 step: 127, loss is 2.1182963848114014 epoch: 1 step: 128, loss is 2.100033760070801 epoch: 1 step: 129, loss is 1.9823192358016968 epoch: 1 step: 130, loss is 1.8895161151885986 epoch: 1 step: 131, loss is 1.900534749031067 epoch: 1 step: 132, loss is 1.9451504945755005 epoch: 1 step: 133, loss is 1.9374111890792847 epoch: 1 step: 134, loss is 2.0243308544158936 epoch: 1 step: 135, loss is 1.9536951780319214 epoch: 1 step: 136, loss is 2.1073625087738037 epoch: 1 step: 137, loss is 2.0234878063201904 epoch: 1 step: 138, loss is 1.9592859745025635 epoch: 1 step: 139, loss is 2.0593502521514893 epoch: 1 step: 140, loss is 2.0046000480651855 epoch: 1 step: 141, loss is 1.9181723594665527 epoch: 1 step: 142, loss is 1.8924415111541748 epoch: 1 step: 143, loss is 2.0287814140319824 epoch: 1 step: 144, loss is 2.0727789402008057 epoch: 1 step: 145, loss is 2.0406992435455322 epoch: 1 step: 146, loss is 1.9817670583724976 epoch: 1 step: 147, loss is 2.0699660778045654 epoch: 1 step: 148, loss is 2.087092638015747 epoch: 1 step: 149, loss is 2.040200710296631 epoch: 1 step: 150, loss is 2.01078724861145 epoch: 1 step: 151, loss is 2.0484468936920166 epoch: 1 step: 152, loss is 1.9874005317687988 epoch: 1 step: 153, loss is 1.9814248085021973 epoch: 1 step: 154, loss is 1.967137098312378 epoch: 1 step: 155, loss is 1.9417905807495117 epoch: 1 step: 156, loss is 1.9325077533721924 epoch: 1 step: 157, loss is 2.038830041885376 epoch: 1 step: 158, loss is 1.9740386009216309 epoch: 1 step: 159, loss is 1.982688307762146 epoch: 1 step: 160, loss is 2.0671212673187256 epoch: 1 step: 161, loss is 1.966343641281128 epoch: 1 step: 162, loss is 1.9925696849822998 epoch: 1 step: 163, loss is 1.946681261062622 epoch: 1 step: 164, loss is 1.946541428565979 epoch: 1 step: 165, loss is 2.023364543914795 epoch: 1 step: 166, loss is 1.902091145515442 epoch: 1 step: 167, loss is 1.9311779737472534 epoch: 1 step: 168, loss is 2.014444351196289 epoch: 1 step: 169, loss is 1.9326798915863037 epoch: 1 step: 170, loss is 2.029386520385742 epoch: 1 step: 171, loss is 1.9097285270690918 epoch: 1 step: 172, loss is 1.8932981491088867 epoch: 1 step: 173, loss is 1.9614872932434082 epoch: 1 step: 174, loss is 1.9768420457839966 epoch: 1 step: 175, loss is 1.9417202472686768 epoch: 1 step: 176, loss is 1.9478507041931152 epoch: 1 step: 177, loss is 1.935025691986084 epoch: 1 step: 178, loss is 1.9080874919891357 epoch: 1 step: 179, loss is 2.007932662963867 epoch: 1 step: 180, loss is 1.943689227104187 epoch: 1 step: 181, loss is 1.9303019046783447 epoch: 1 step: 182, loss is 2.0126776695251465 epoch: 1 step: 183, loss is 1.951655626296997 epoch: 1 step: 184, loss is 1.8968234062194824 epoch: 1 step: 185, loss is 1.9077074527740479 epoch: 1 step: 186, loss is 1.9576302766799927 epoch: 1 step: 187, loss is 2.0126757621765137 epoch: 1 step: 188, loss is 1.9318937063217163 epoch: 1 step: 189, loss is 2.070370674133301 epoch: 1 step: 190, loss is 1.9427413940429688 epoch: 1 step: 191, loss is 1.9774736166000366 epoch: 1 step: 192, loss is 1.9788503646850586 epoch: 1 step: 193, loss is 1.930885672569275 epoch: 1 step: 194, loss is 1.9480682611465454 epoch: 1 step: 195, loss is 1.9859598875045776 epoch: 1 step: 196, loss is 2.002511978149414 epoch: 1 step: 197, loss is 1.940032958984375 epoch: 1 step: 198, loss is 1.9755717515945435 epoch: 1 step: 199, loss is 1.939618706703186 epoch: 1 step: 200, loss is 2.001927137374878 epoch: 1 step: 201, loss is 1.9595344066619873 epoch: 1 step: 202, loss is 1.9937856197357178 epoch: 1 step: 203, loss is 2.03942608833313 epoch: 1 step: 204, loss is 1.9261990785598755 epoch: 1 step: 205, loss is 1.9673774242401123 epoch: 1 step: 206, loss is 1.9115777015686035 epoch: 1 step: 207, loss is 1.8920965194702148 epoch: 1 step: 208, loss is 2.0317740440368652 epoch: 1 step: 209, loss is 1.9703094959259033 epoch: 1 step: 210, loss is 1.8635448217391968 epoch: 1 step: 211, loss is 1.955244779586792 epoch: 1 step: 212, loss is 2.0128376483917236 epoch: 1 step: 213, loss is 1.9284648895263672 epoch: 1 step: 214, loss is 1.9553477764129639 epoch: 1 step: 215, loss is 1.9950671195983887 epoch: 1 step: 216, loss is 1.919708490371704 epoch: 1 step: 217, loss is 1.9465675354003906 epoch: 1 step: 218, loss is 1.9259291887283325 epoch: 1 step: 219, loss is 1.8622301816940308 epoch: 1 step: 220, loss is 1.925300121307373 epoch: 1 step: 221, loss is 1.8980937004089355 epoch: 1 step: 222, loss is 1.960056185722351 epoch: 1 step: 223, loss is 1.9597039222717285 epoch: 1 step: 224, loss is 1.8744360208511353 epoch: 1 step: 225, loss is 1.9043315649032593 epoch: 1 step: 226, loss is 1.9731948375701904 epoch: 1 step: 227, loss is 1.8819395303726196 epoch: 1 step: 228, loss is 1.8765403032302856 epoch: 1 step: 229, loss is 1.8105037212371826 epoch: 1 step: 230, loss is 1.9757781028747559 epoch: 1 step: 231, loss is 1.986043095588684 epoch: 1 step: 232, loss is 1.9739878177642822 epoch: 1 step: 233, loss is 1.9835622310638428 epoch: 1 step: 234, loss is 1.7967957258224487 epoch: 1 step: 235, loss is 1.9233113527297974 epoch: 1 step: 236, loss is 1.937123417854309 epoch: 1 step: 237, loss is 1.8942066431045532 epoch: 1 step: 238, loss is 1.997687816619873 epoch: 1 step: 239, loss is 1.9943475723266602 epoch: 1 step: 240, loss is 1.897137999534607 epoch: 1 step: 241, loss is 1.9524480104446411 epoch: 1 step: 242, loss is 1.858811378479004 epoch: 1 step: 243, loss is 1.884620189666748 epoch: 1 step: 244, loss is 1.9191231727600098 epoch: 1 step: 245, loss is 1.9229687452316284 epoch: 1 step: 246, loss is 1.9181244373321533 epoch: 1 step: 247, loss is 1.858770489692688 epoch: 1 step: 248, loss is 1.9117258787155151 epoch: 1 step: 249, loss is 1.92829430103302 epoch: 1 step: 250, loss is 1.9199907779693604 epoch: 1 step: 251, loss is 1.9308414459228516 epoch: 1 step: 252, loss is 1.945981502532959 epoch: 1 step: 253, loss is 1.9086549282073975 epoch: 1 step: 254, loss is 2.00075364112854 epoch: 1 step: 255, loss is 1.9397599697113037 epoch: 1 step: 256, loss is 1.9448959827423096 epoch: 1 step: 257, loss is 1.815433382987976 epoch: 1 step: 258, loss is 1.859832763671875 epoch: 1 step: 259, loss is 1.971736192703247 epoch: 1 step: 260, loss is 2.004875898361206 epoch: 1 step: 261, loss is 1.9012647867202759 epoch: 1 step: 262, loss is 1.8306684494018555 epoch: 1 step: 263, loss is 1.984313726425171 epoch: 1 step: 264, loss is 1.8443315029144287 epoch: 1 step: 265, loss is 1.8624560832977295 epoch: 1 step: 266, loss is 1.9326941967010498 epoch: 1 step: 267, loss is 1.9113365411758423 epoch: 1 step: 268, loss is 1.9158111810684204 epoch: 1 step: 269, loss is 2.049499273300171 epoch: 1 step: 270, loss is 1.9675345420837402 epoch: 1 step: 271, loss is 1.9279316663742065 epoch: 1 step: 272, loss is 1.8232439756393433 epoch: 1 step: 273, loss is 1.8978644609451294 epoch: 1 step: 274, loss is 1.89058256149292 epoch: 1 step: 275, loss is 1.8708573579788208 epoch: 1 step: 276, loss is 1.8658140897750854 epoch: 1 step: 277, loss is 1.9091688394546509 epoch: 1 step: 278, loss is 1.8666309118270874 epoch: 1 step: 279, loss is 1.9317255020141602 epoch: 1 step: 280, loss is 1.991161823272705 epoch: 1 step: 281, loss is 1.8272242546081543 epoch: 1 step: 282, loss is 1.9044129848480225 epoch: 1 step: 283, loss is 1.9505462646484375 epoch: 1 step: 284, loss is 1.859652042388916 epoch: 1 step: 285, loss is 1.88494074344635 epoch: 1 step: 286, loss is 1.9080116748809814 epoch: 1 step: 287, loss is 1.8882266283035278 epoch: 1 step: 288, loss is 1.9163529872894287 epoch: 1 step: 289, loss is 1.8715424537658691 epoch: 1 step: 290, loss is 1.895844578742981 epoch: 1 step: 291, loss is 1.926143765449524 epoch: 1 step: 292, loss is 1.8780732154846191 epoch: 1 step: 293, loss is 2.0592093467712402 epoch: 1 step: 294, loss is 1.8703604936599731 epoch: 1 step: 295, loss is 1.927522897720337 epoch: 1 step: 296, loss is 1.7786564826965332 epoch: 1 step: 297, loss is 1.860284686088562 epoch: 1 step: 298, loss is 1.8343312740325928 epoch: 1 step: 299, loss is 1.9373613595962524 epoch: 1 step: 300, loss is 1.8537380695343018 epoch: 1 step: 301, loss is 1.9855666160583496 epoch: 1 step: 302, loss is 1.922060489654541 epoch: 1 step: 303, loss is 1.9835606813430786 epoch: 1 step: 304, loss is 1.8231443166732788 epoch: 1 step: 305, loss is 1.9540646076202393 epoch: 1 step: 306, loss is 1.831480622291565 epoch: 1 step: 307, loss is 1.8689630031585693 epoch: 1 step: 308, loss is 1.9101365804672241 epoch: 1 step: 309, loss is 1.90790593624115 epoch: 1 step: 310, loss is 1.9162652492523193 epoch: 1 step: 311, loss is 1.9321383237838745 epoch: 1 step: 312, loss is 1.9892783164978027 epoch: 1 step: 313, loss is 1.8137577772140503 epoch: 1 step: 314, loss is 1.899651288986206 epoch: 1 step: 315, loss is 1.9436644315719604 epoch: 1 step: 316, loss is 1.7932474613189697 epoch: 1 step: 317, loss is 1.9467377662658691 epoch: 1 step: 318, loss is 1.9173305034637451 epoch: 1 step: 319, loss is 1.9331930875778198 epoch: 1 step: 320, loss is 1.8739713430404663 epoch: 1 step: 321, loss is 1.9036370515823364 epoch: 1 step: 322, loss is 1.8275716304779053 epoch: 1 step: 323, loss is 1.944501280784607 epoch: 1 step: 324, loss is 1.8843615055084229 epoch: 1 step: 325, loss is 1.837125539779663 epoch: 1 step: 326, loss is 1.8234528303146362 epoch: 1 step: 327, loss is 1.8961336612701416 epoch: 1 step: 328, loss is 1.9050860404968262 epoch: 1 step: 329, loss is 1.8687013387680054 epoch: 1 step: 330, loss is 1.8470200300216675 epoch: 1 step: 331, loss is 1.9998462200164795 epoch: 1 step: 332, loss is 1.9475746154785156 epoch: 1 step: 333, loss is 1.8606044054031372 epoch: 1 step: 334, loss is 1.8657100200653076 epoch: 1 step: 335, loss is 1.8894202709197998 epoch: 1 step: 336, loss is 1.8159394264221191 epoch: 1 step: 337, loss is 1.7843891382217407 epoch: 1 step: 338, loss is 1.8916523456573486 epoch: 1 step: 339, loss is 1.7935060262680054 epoch: 1 step: 340, loss is 1.8278096914291382 epoch: 1 step: 341, loss is 1.8356901407241821 epoch: 1 step: 342, loss is 1.8652888536453247 epoch: 1 step: 343, loss is 1.9316866397857666 epoch: 1 step: 344, loss is 1.803539752960205 epoch: 1 step: 345, loss is 1.84087336063385 epoch: 1 step: 346, loss is 2.020559549331665 epoch: 1 step: 347, loss is 1.9127906560897827 epoch: 1 step: 348, loss is 1.8542912006378174 epoch: 1 step: 349, loss is 1.8020504713058472 epoch: 1 step: 350, loss is 1.9152642488479614 epoch: 1 step: 351, loss is 1.9949240684509277 epoch: 1 step: 352, loss is 1.8379167318344116 epoch: 1 step: 353, loss is 1.9077467918395996 epoch: 1 step: 354, loss is 1.896492600440979 epoch: 1 step: 355, loss is 1.9723351001739502 epoch: 1 step: 356, loss is 2.032634735107422 epoch: 1 step: 357, loss is 1.9882268905639648 epoch: 1 step: 358, loss is 1.8524292707443237 epoch: 1 step: 359, loss is 1.969857931137085 epoch: 1 step: 360, loss is 1.978499174118042 epoch: 1 step: 361, loss is 1.7964271306991577 epoch: 1 step: 362, loss is 1.746205449104309 epoch: 1 step: 363, loss is 1.8945220708847046 epoch: 1 step: 364, loss is 1.894822597503662 epoch: 1 step: 365, loss is 1.8976346254348755 epoch: 1 step: 366, loss is 2.011072874069214 epoch: 1 step: 367, loss is 1.8875963687896729 epoch: 1 step: 368, loss is 1.8663601875305176 epoch: 1 step: 369, loss is 1.7830815315246582 epoch: 1 step: 370, loss is 1.888221263885498 epoch: 1 step: 371, loss is 1.867682695388794 epoch: 1 step: 372, loss is 1.864630937576294 epoch: 1 step: 373, loss is 1.8272963762283325 epoch: 1 step: 374, loss is 1.8457086086273193 epoch: 1 step: 375, loss is 1.7715091705322266 epoch: 1 step: 376, loss is 1.8974144458770752 epoch: 1 step: 377, loss is 1.8871971368789673 epoch: 1 step: 378, loss is 1.895826816558838 epoch: 1 step: 379, loss is 1.8768517971038818 epoch: 1 step: 380, loss is 1.9171315431594849 epoch: 1 step: 381, loss is 1.897505760192871 epoch: 1 step: 382, loss is 1.9331414699554443 epoch: 1 step: 383, loss is 1.9014980792999268 epoch: 1 step: 384, loss is 1.9041805267333984 epoch: 1 step: 385, loss is 1.8365564346313477 epoch: 1 step: 386, loss is 1.8542208671569824 epoch: 1 step: 387, loss is 1.7386746406555176 epoch: 1 step: 388, loss is 1.8902710676193237 epoch: 1 step: 389, loss is 1.801804780960083 epoch: 1 step: 390, loss is 1.8723918199539185 Train epoch time: 521838.712 ms, per step time: 1338.048 ms epoch: 2 step: 1, loss is 1.834862232208252 epoch: 2 step: 2, loss is 1.849745750427246 epoch: 2 step: 3, loss is 1.7054263353347778 epoch: 2 step: 4, loss is 1.7429084777832031 epoch: 2 step: 5, loss is 1.8363549709320068 epoch: 2 step: 6, loss is 1.910990595817566 epoch: 2 step: 7, loss is 1.878669023513794 epoch: 2 step: 8, loss is 1.8345451354980469 epoch: 2 step: 9, loss is 1.80141019821167 epoch: 2 step: 10, loss is 1.762511134147644 epoch: 2 step: 11, loss is 1.7321021556854248 epoch: 2 step: 12, loss is 1.9149668216705322 epoch: 2 step: 13, loss is 1.8518586158752441 epoch: 2 step: 14, loss is 1.8448243141174316 epoch: 2 step: 15, loss is 1.7971129417419434 epoch: 2 step: 16, loss is 1.8262436389923096 epoch: 2 step: 17, loss is 1.8549871444702148 epoch: 2 step: 18, loss is 1.837705373764038 epoch: 2 step: 19, loss is 1.8181580305099487 epoch: 2 step: 20, loss is 1.7538896799087524 epoch: 2 step: 21, loss is 1.7806408405303955 epoch: 2 step: 22, loss is 1.7826101779937744 epoch: 2 step: 23, loss is 1.8982733488082886 epoch: 2 step: 24, loss is 1.7070553302764893 epoch: 2 step: 25, loss is 1.7258012294769287 epoch: 2 step: 26, loss is 1.8621647357940674 epoch: 2 step: 27, loss is 1.8421783447265625 epoch: 2 step: 28, loss is 1.718170404434204 epoch: 2 step: 29, loss is 1.9222381114959717 epoch: 2 step: 30, loss is 1.8912699222564697 epoch: 2 step: 31, loss is 1.866867184638977 epoch: 2 step: 32, loss is 1.9003292322158813 epoch: 2 step: 33, loss is 1.8883135318756104 epoch: 2 step: 34, loss is 1.8029189109802246 epoch: 2 step: 35, loss is 1.7748987674713135 epoch: 2 step: 36, loss is 1.8300625085830688 epoch: 2 step: 37, loss is 1.7659794092178345 epoch: 2 step: 38, loss is 1.8682295083999634 epoch: 2 step: 39, loss is 1.8282811641693115 epoch: 2 step: 40, loss is 1.845916509628296 epoch: 2 step: 41, loss is 1.8930107355117798 epoch: 2 step: 42, loss is 2.0058679580688477 epoch: 2 step: 43, loss is 1.816670536994934 epoch: 2 step: 44, loss is 1.8222073316574097 epoch: 2 step: 45, loss is 1.8928234577178955 epoch: 2 step: 46, loss is 1.9439499378204346 epoch: 2 step: 47, loss is 1.8602352142333984 epoch: 2 step: 48, loss is 1.8652702569961548 epoch: 2 step: 49, loss is 1.7660480737686157 epoch: 2 step: 50, loss is 1.7618073225021362 epoch: 2 step: 51, loss is 1.8508286476135254 epoch: 2 step: 52, loss is 1.8416081666946411 epoch: 2 step: 53, loss is 1.9593312740325928 epoch: 2 step: 54, loss is 2.051502227783203 epoch: 2 step: 55, loss is 1.7987666130065918 epoch: 2 step: 56, loss is 1.76220703125 epoch: 2 step: 57, loss is 1.820993423461914 epoch: 2 step: 58, loss is 1.814599633216858 epoch: 2 step: 59, loss is 1.8549848794937134 epoch: 2 step: 60, loss is 1.8419703245162964 epoch: 2 step: 61, loss is 1.7779773473739624 epoch: 2 step: 62, loss is 1.832528829574585 epoch: 2 step: 63, loss is 1.7780293226242065 epoch: 2 step: 64, loss is 2.028048276901245 epoch: 2 step: 65, loss is 1.8628039360046387 epoch: 2 step: 66, loss is 1.8829567432403564 epoch: 2 step: 67, loss is 1.8905911445617676 epoch: 2 step: 68, loss is 1.8611254692077637 epoch: 2 step: 69, loss is 2.012552261352539 epoch: 2 step: 70, loss is 2.042849540710449 epoch: 2 step: 71, loss is 1.8537817001342773 epoch: 2 step: 72, loss is 1.8915587663650513 epoch: 2 step: 73, loss is 1.7604776620864868 epoch: 2 step: 74, loss is 1.8138325214385986 epoch: 2 step: 75, loss is 1.835501790046692 epoch: 2 step: 76, loss is 1.8492381572723389 epoch: 2 step: 77, loss is 1.8122605085372925 epoch: 2 step: 78, loss is 1.9520466327667236 epoch: 2 step: 79, loss is 1.8288925886154175 epoch: 2 step: 80, loss is 1.7794359922409058 epoch: 2 step: 81, loss is 1.7442902326583862 epoch: 2 step: 82, loss is 1.8219373226165771 epoch: 2 step: 83, loss is 1.832875370979309 epoch: 2 step: 84, loss is 1.8357572555541992 epoch: 2 step: 85, loss is 1.7819247245788574 epoch: 2 step: 86, loss is 1.8400955200195312 epoch: 2 step: 87, loss is 1.7577227354049683 epoch: 2 step: 88, loss is 1.7630740404129028 epoch: 2 step: 89, loss is 1.7891596555709839 epoch: 2 step: 90, loss is 1.9814701080322266 epoch: 2 step: 91, loss is 1.7651830911636353 epoch: 2 step: 92, loss is 1.7697455883026123 epoch: 2 step: 93, loss is 1.8312571048736572 epoch: 2 step: 94, loss is 1.7942986488342285 epoch: 2 step: 95, loss is 1.7663031816482544 epoch: 2 step: 96, loss is 1.8291752338409424 epoch: 2 step: 97, loss is 1.8728686571121216 epoch: 2 step: 98, loss is 1.8407989740371704 epoch: 2 step: 99, loss is 1.853529453277588 epoch: 2 step: 100, loss is 1.7540059089660645 epoch: 2 step: 101, loss is 1.7780358791351318 epoch: 2 step: 102, loss is 1.7782604694366455 epoch: 2 step: 103, loss is 1.809798240661621 epoch: 2 step: 104, loss is 1.968491554260254 epoch: 2 step: 105, loss is 1.8270950317382812 epoch: 2 step: 106, loss is 1.729768991470337 epoch: 2 step: 107, loss is 1.9185408353805542 epoch: 2 step: 108, loss is 1.8389308452606201 epoch: 2 step: 109, loss is 1.9074642658233643 epoch: 2 step: 110, loss is 1.7574139833450317 epoch: 2 step: 111, loss is 1.7461507320404053 epoch: 2 step: 112, loss is 1.81948721408844 epoch: 2 step: 113, loss is 1.7444366216659546 epoch: 2 step: 114, loss is 1.9319080114364624 epoch: 2 step: 115, loss is 1.7952524423599243 epoch: 2 step: 116, loss is 1.808004379272461 epoch: 2 step: 117, loss is 1.8550138473510742 epoch: 2 step: 118, loss is 1.7955305576324463 epoch: 2 step: 119, loss is 1.8233920335769653 epoch: 2 step: 120, loss is 1.9276678562164307 epoch: 2 step: 121, loss is 1.8658866882324219 epoch: 2 step: 122, loss is 1.751840591430664 epoch: 2 step: 123, loss is 1.8440660238265991 epoch: 2 step: 124, loss is 1.8113787174224854 epoch: 2 step: 125, loss is 1.7702703475952148 epoch: 2 step: 126, loss is 1.8395698070526123 epoch: 2 step: 127, loss is 1.8186851739883423 epoch: 2 step: 128, loss is 1.8916821479797363 epoch: 2 step: 129, loss is 1.740739107131958 epoch: 2 step: 130, loss is 1.8140079975128174 epoch: 2 step: 131, loss is 1.8821009397506714 epoch: 2 step: 132, loss is 1.974631667137146 epoch: 2 step: 133, loss is 1.7252826690673828 epoch: 2 step: 134, loss is 1.7294714450836182 epoch: 2 step: 135, loss is 1.746224284172058 epoch: 2 step: 136, loss is 1.835279941558838 epoch: 2 step: 137, loss is 1.8112125396728516 epoch: 2 step: 138, loss is 1.8524725437164307 epoch: 2 step: 139, loss is 1.8074004650115967 epoch: 2 step: 140, loss is 1.7848730087280273 epoch: 2 step: 141, loss is 1.7558326721191406 epoch: 2 step: 142, loss is 1.8558393716812134 epoch: 2 step: 143, loss is 1.8726437091827393 epoch: 2 step: 144, loss is 1.8240671157836914 epoch: 2 step: 145, loss is 1.7925260066986084 epoch: 2 step: 146, loss is 1.7609148025512695 epoch: 2 step: 147, loss is 1.817206621170044 epoch: 2 step: 148, loss is 1.8012206554412842 epoch: 2 step: 149, loss is 1.7344681024551392 epoch: 2 step: 150, loss is 1.8545403480529785 epoch: 2 step: 151, loss is 1.9346392154693604 epoch: 2 step: 152, loss is 1.7847907543182373 epoch: 2 step: 153, loss is 1.7996902465820312 epoch: 2 step: 154, loss is 1.8361784219741821 epoch: 2 step: 155, loss is 1.802649974822998 epoch: 2 step: 156, loss is 1.8078819513320923 epoch: 2 step: 157, loss is 1.7052006721496582 epoch: 2 step: 158, loss is 1.7516093254089355 epoch: 2 step: 159, loss is 1.8032681941986084 epoch: 2 step: 160, loss is 1.8555339574813843 epoch: 2 step: 161, loss is 1.8104159832000732 epoch: 2 step: 162, loss is 1.7981441020965576 epoch: 2 step: 163, loss is 1.7937678098678589 epoch: 2 step: 164, loss is 1.8369343280792236 epoch: 2 step: 165, loss is 1.7590316534042358 epoch: 2 step: 166, loss is 1.903198003768921 epoch: 2 step: 167, loss is 1.809228777885437 epoch: 2 step: 168, loss is 1.7207717895507812 epoch: 2 step: 169, loss is 1.6803689002990723 epoch: 2 step: 170, loss is 1.7886427640914917 epoch: 2 step: 171, loss is 1.7613189220428467 epoch: 2 step: 172, loss is 1.8411223888397217 epoch: 2 step: 173, loss is 1.789926290512085 epoch: 2 step: 174, loss is 1.8194866180419922 epoch: 2 step: 175, loss is 1.7798203229904175 epoch: 2 step: 176, loss is 1.7686288356781006 epoch: 2 step: 177, loss is 1.7124934196472168 epoch: 2 step: 178, loss is 1.8090981245040894 epoch: 2 step: 179, loss is 1.8615831136703491 epoch: 2 step: 180, loss is 1.9105007648468018 epoch: 2 step: 181, loss is 1.7555516958236694 epoch: 2 step: 182, loss is 1.8180744647979736 epoch: 2 step: 183, loss is 1.7644696235656738 epoch: 2 step: 184, loss is 1.7582026720046997 epoch: 2 step: 185, loss is 1.7807061672210693 epoch: 2 step: 186, loss is 1.9195256233215332 epoch: 2 step: 187, loss is 1.8233197927474976 epoch: 2 step: 188, loss is 1.7427889108657837 epoch: 2 step: 189, loss is 1.9037153720855713 epoch: 2 step: 190, loss is 1.7720736265182495 epoch: 2 step: 191, loss is 1.6791305541992188 epoch: 2 step: 192, loss is 1.8491283655166626 epoch: 2 step: 193, loss is 1.7785916328430176 epoch: 2 step: 194, loss is 1.7852778434753418 epoch: 2 step: 195, loss is 1.7878570556640625 epoch: 2 step: 196, loss is 1.8266456127166748 epoch: 2 step: 197, loss is 1.7733447551727295 epoch: 2 step: 198, loss is 1.7677949666976929 epoch: 2 step: 199, loss is 1.693023443222046 epoch: 2 step: 200, loss is 1.757947564125061 epoch: 2 step: 201, loss is 1.7988077402114868 epoch: 2 step: 202, loss is 1.8327466249465942 epoch: 2 step: 203, loss is 1.7021350860595703 epoch: 2 step: 204, loss is 1.8033387660980225 epoch: 2 step: 205, loss is 1.7271716594696045 epoch: 2 step: 206, loss is 1.7071491479873657 epoch: 2 step: 207, loss is 1.8059285879135132 epoch: 2 step: 208, loss is 1.7724381685256958 epoch: 2 step: 209, loss is 1.6461853981018066 epoch: 2 step: 210, loss is 1.7500430345535278 epoch: 2 step: 211, loss is 1.7728413343429565 epoch: 2 step: 212, loss is 1.7478232383728027 epoch: 2 step: 213, loss is 1.805396318435669 epoch: 2 step: 214, loss is 1.7564077377319336 epoch: 2 step: 215, loss is 1.8484628200531006 epoch: 2 step: 216, loss is 1.842712163925171 epoch: 2 step: 217, loss is 1.7719712257385254 epoch: 2 step: 218, loss is 1.7246894836425781 epoch: 2 step: 219, loss is 1.7537436485290527 epoch: 2 step: 220, loss is 1.8012590408325195 epoch: 2 step: 221, loss is 1.7071820497512817 epoch: 2 step: 222, loss is 1.7319221496582031 epoch: 2 step: 223, loss is 1.7418954372406006 epoch: 2 step: 224, loss is 1.756356954574585 epoch: 2 step: 225, loss is 1.9026412963867188 epoch: 2 step: 226, loss is 1.7950100898742676 epoch: 2 step: 227, loss is 1.7687002420425415 epoch: 2 step: 228, loss is 1.7988834381103516 epoch: 2 step: 229, loss is 1.7558226585388184 epoch: 2 step: 230, loss is 1.7972447872161865 epoch: 2 step: 231, loss is 1.7668306827545166 epoch: 2 step: 232, loss is 1.7706915140151978 epoch: 2 step: 233, loss is 1.8523707389831543 epoch: 2 step: 234, loss is 1.7863742113113403 epoch: 2 step: 235, loss is 1.7455711364746094 epoch: 2 step: 236, loss is 1.8494515419006348 epoch: 2 step: 237, loss is 1.8098007440567017 epoch: 2 step: 238, loss is 1.7320868968963623 epoch: 2 step: 239, loss is 1.7073906660079956 epoch: 2 step: 240, loss is 1.7543745040893555 epoch: 2 step: 241, loss is 1.7828773260116577 epoch: 2 step: 242, loss is 1.8017992973327637 epoch: 2 step: 243, loss is 1.75504469871521 epoch: 2 step: 244, loss is 1.8391574621200562 epoch: 2 step: 245, loss is 1.8867601156234741 epoch: 2 step: 246, loss is 1.7903711795806885 epoch: 2 step: 247, loss is 1.724091649055481 epoch: 2 step: 248, loss is 1.756180763244629 epoch: 2 step: 249, loss is 1.7248591184616089 epoch: 2 step: 250, loss is 1.6956037282943726 epoch: 2 step: 251, loss is 1.9044727087020874 epoch: 2 step: 252, loss is 1.7202427387237549 epoch: 2 step: 253, loss is 1.8590054512023926 epoch: 2 step: 254, loss is 1.7541002035140991 epoch: 2 step: 255, loss is 1.7449924945831299 epoch: 2 step: 256, loss is 1.8348194360733032 epoch: 2 step: 257, loss is 1.8214263916015625 epoch: 2 step: 258, loss is 1.7933714389801025 epoch: 2 step: 259, loss is 1.7224136590957642 epoch: 2 step: 260, loss is 1.8056836128234863 epoch: 2 step: 261, loss is 1.804041862487793 epoch: 2 step: 262, loss is 1.77969229221344 epoch: 2 step: 263, loss is 1.7709102630615234 epoch: 2 step: 264, loss is 1.7914643287658691 epoch: 2 step: 265, loss is 1.7718045711517334 epoch: 2 step: 266, loss is 1.8532987833023071 epoch: 2 step: 267, loss is 1.7140107154846191 epoch: 2 step: 268, loss is 1.868401288986206 epoch: 2 step: 269, loss is 1.831573486328125 epoch: 2 step: 270, loss is 1.9516619443893433 epoch: 2 step: 271, loss is 1.7784669399261475 epoch: 2 step: 272, loss is 1.7207751274108887 epoch: 2 step: 273, loss is 1.823856234550476 epoch: 2 step: 274, loss is 1.8284249305725098 epoch: 2 step: 275, loss is 1.8176841735839844 epoch: 2 step: 276, loss is 1.7832752466201782 epoch: 2 step: 277, loss is 1.8370630741119385 epoch: 2 step: 278, loss is 1.7447307109832764 epoch: 2 step: 279, loss is 1.8517227172851562 epoch: 2 step: 280, loss is 1.7266693115234375 epoch: 2 step: 281, loss is 1.8552228212356567 epoch: 2 step: 282, loss is 1.8542191982269287 epoch: 2 step: 283, loss is 1.8414582014083862 epoch: 2 step: 284, loss is 1.6943976879119873 epoch: 2 step: 285, loss is 1.783471703529358 epoch: 2 step: 286, loss is 1.736589789390564 epoch: 2 step: 287, loss is 1.7182533740997314 epoch: 2 step: 288, loss is 1.672207236289978 epoch: 2 step: 289, loss is 1.7645574808120728 epoch: 2 step: 290, loss is 1.7900241613388062 epoch: 2 step: 291, loss is 1.6984515190124512 epoch: 2 step: 292, loss is 1.822729229927063 epoch: 2 step: 293, loss is 1.8808338642120361 epoch: 2 step: 294, loss is 1.6853476762771606 epoch: 2 step: 295, loss is 1.8157317638397217 epoch: 2 step: 296, loss is 1.8059608936309814 epoch: 2 step: 297, loss is 1.7534812688827515 epoch: 2 step: 298, loss is 1.76722252368927 epoch: 2 step: 299, loss is 1.8134665489196777 epoch: 2 step: 300, loss is 1.6877107620239258 epoch: 2 step: 301, loss is 1.6185097694396973 epoch: 2 step: 302, loss is 1.7758712768554688 epoch: 2 step: 303, loss is 1.755950689315796 epoch: 2 step: 304, loss is 1.704177737236023 epoch: 2 step: 305, loss is 1.6892074346542358 epoch: 2 step: 306, loss is 1.815639615058899 epoch: 2 step: 307, loss is 1.8549275398254395 epoch: 2 step: 308, loss is 1.8104307651519775 epoch: 2 step: 309, loss is 1.735305666923523 epoch: 2 step: 310, loss is 1.7396290302276611 epoch: 2 step: 311, loss is 1.8549031019210815 epoch: 2 step: 312, loss is 1.7778637409210205 epoch: 2 step: 313, loss is 1.6825811862945557 epoch: 2 step: 314, loss is 1.6482632160186768 epoch: 2 step: 315, loss is 1.7172439098358154 epoch: 2 step: 316, loss is 1.7548999786376953 epoch: 2 step: 317, loss is 1.5849336385726929 epoch: 2 step: 318, loss is 1.7280230522155762 epoch: 2 step: 319, loss is 1.8552422523498535 epoch: 2 step: 320, loss is 1.6886619329452515 epoch: 2 step: 321, loss is 1.7072863578796387 epoch: 2 step: 322, loss is 1.6938064098358154 epoch: 2 step: 323, loss is 1.6820656061172485 epoch: 2 step: 324, loss is 1.7380082607269287 epoch: 2 step: 325, loss is 1.804673671722412 epoch: 2 step: 326, loss is 1.7792102098464966 epoch: 2 step: 327, loss is 1.8998641967773438 epoch: 2 step: 328, loss is 1.7556779384613037 epoch: 2 step: 329, loss is 1.7103122472763062 epoch: 2 step: 330, loss is 1.8625636100769043 epoch: 2 step: 331, loss is 1.771412968635559 epoch: 2 step: 332, loss is 1.7709072828292847 epoch: 2 step: 333, loss is 1.6749439239501953 epoch: 2 step: 334, loss is 1.7068681716918945 epoch: 2 step: 335, loss is 1.7397358417510986 epoch: 2 step: 336, loss is 1.7584718465805054 epoch: 2 step: 337, loss is 1.7621710300445557 epoch: 2 step: 338, loss is 1.7754266262054443 epoch: 2 step: 339, loss is 1.7668453454971313 epoch: 2 step: 340, loss is 1.8041253089904785 epoch: 2 step: 341, loss is 1.7770304679870605 epoch: 2 step: 342, loss is 1.7730276584625244 epoch: 2 step: 343, loss is 1.7460929155349731 epoch: 2 step: 344, loss is 1.758213758468628 epoch: 2 step: 345, loss is 1.744318962097168 epoch: 2 step: 346, loss is 1.675571322441101 epoch: 2 step: 347, loss is 1.7015323638916016 epoch: 2 step: 348, loss is 1.7086268663406372 epoch: 2 step: 349, loss is 1.9846800565719604 epoch: 2 step: 350, loss is 1.7432291507720947 epoch: 2 step: 351, loss is 1.7636148929595947 epoch: 2 step: 352, loss is 1.7178688049316406 epoch: 2 step: 353, loss is 1.693436861038208 epoch: 2 step: 354, loss is 1.7899049520492554 epoch: 2 step: 355, loss is 1.8009570837020874 epoch: 2 step: 356, loss is 1.7731927633285522 epoch: 2 step: 357, loss is 1.7382633686065674 epoch: 2 step: 358, loss is 1.87336003780365 epoch: 2 step: 359, loss is 1.7038979530334473 epoch: 2 step: 360, loss is 1.7798569202423096 epoch: 2 step: 361, loss is 1.6971291303634644 epoch: 2 step: 362, loss is 1.6986044645309448 epoch: 2 step: 363, loss is 1.8026065826416016 epoch: 2 step: 364, loss is 1.6595711708068848 epoch: 2 step: 365, loss is 1.8221900463104248 epoch: 2 step: 366, loss is 1.7795649766921997 epoch: 2 step: 367, loss is 1.7483983039855957 epoch: 2 step: 368, loss is 1.8899071216583252 epoch: 2 step: 369, loss is 1.6659290790557861 epoch: 2 step: 370, loss is 1.746132254600525 epoch: 2 step: 371, loss is 1.683497428894043 epoch: 2 step: 372, loss is 1.7085784673690796 epoch: 2 step: 373, loss is 1.7269572019577026 epoch: 2 step: 374, loss is 1.7484683990478516 epoch: 2 step: 375, loss is 1.7971100807189941 epoch: 2 step: 376, loss is 1.8449757099151611 epoch: 2 step: 377, loss is 1.8399022817611694 epoch: 2 step: 378, loss is 1.777262806892395 epoch: 2 step: 379, loss is 1.7276949882507324 epoch: 2 step: 380, loss is 1.7123351097106934 epoch: 2 step: 381, loss is 1.7212237119674683 epoch: 2 step: 382, loss is 1.7655616998672485 epoch: 2 step: 383, loss is 1.7500815391540527 epoch: 2 step: 384, loss is 1.789592981338501 epoch: 2 step: 385, loss is 1.8415806293487549 epoch: 2 step: 386, loss is 1.6377445459365845 epoch: 2 step: 387, loss is 1.6689974069595337 epoch: 2 step: 388, loss is 1.746870994567871 epoch: 2 step: 389, loss is 1.6795246601104736 epoch: 2 step: 390, loss is 1.661218285560608 Train epoch time: 140430.625 ms, per step time: 360.079 ms epoch: 3 step: 1, loss is 1.715036153793335 epoch: 3 step: 2, loss is 1.6474828720092773 epoch: 3 step: 3, loss is 1.657715916633606 epoch: 3 step: 4, loss is 1.5708024501800537 epoch: 3 step: 5, loss is 1.7823667526245117 epoch: 3 step: 6, loss is 1.672377586364746 epoch: 3 step: 7, loss is 1.7155275344848633 epoch: 3 step: 8, loss is 1.6355713605880737 epoch: 3 step: 9, loss is 1.6684916019439697 epoch: 3 step: 10, loss is 1.7313625812530518 epoch: 3 step: 11, loss is 1.6965363025665283 epoch: 3 step: 12, loss is 1.686906337738037 epoch: 3 step: 13, loss is 1.7073938846588135 epoch: 3 step: 14, loss is 1.6544504165649414 epoch: 3 step: 15, loss is 1.747122049331665 epoch: 3 step: 16, loss is 1.7504947185516357 epoch: 3 step: 17, loss is 1.7175284624099731 epoch: 3 step: 18, loss is 1.6662980318069458 epoch: 3 step: 19, loss is 1.7750905752182007 epoch: 3 step: 20, loss is 1.7788915634155273 epoch: 3 step: 21, loss is 1.640061378479004 epoch: 3 step: 22, loss is 1.7295269966125488 epoch: 3 step: 23, loss is 1.7082421779632568 epoch: 3 step: 24, loss is 1.7740626335144043 epoch: 3 step: 25, loss is 1.6252870559692383 epoch: 3 step: 26, loss is 1.7297760248184204 epoch: 3 step: 27, loss is 1.696889877319336 epoch: 3 step: 28, loss is 1.7444108724594116 epoch: 3 step: 29, loss is 1.6641700267791748 epoch: 3 step: 30, loss is 1.6326826810836792 epoch: 3 step: 31, loss is 1.6847602128982544 epoch: 3 step: 32, loss is 1.6761665344238281 epoch: 3 step: 33, loss is 1.8507628440856934 epoch: 3 step: 34, loss is 1.5723615884780884 epoch: 3 step: 35, loss is 1.6921000480651855 epoch: 3 step: 36, loss is 1.754352331161499 epoch: 3 step: 37, loss is 1.736189842224121 epoch: 3 step: 38, loss is 1.743882179260254 epoch: 3 step: 39, loss is 1.7992541790008545 epoch: 3 step: 40, loss is 1.774916172027588 epoch: 3 step: 41, loss is 1.7360620498657227 epoch: 3 step: 42, loss is 1.7283296585083008 epoch: 3 step: 43, loss is 1.7825541496276855 epoch: 3 step: 44, loss is 1.7632367610931396 epoch: 3 step: 45, loss is 1.7638394832611084 epoch: 3 step: 46, loss is 1.8444280624389648 epoch: 3 step: 47, loss is 1.7872759103775024 epoch: 3 step: 48, loss is 1.7166814804077148 epoch: 3 step: 49, loss is 1.80313241481781 epoch: 3 step: 50, loss is 1.6236335039138794 epoch: 3 step: 51, loss is 1.6777052879333496 epoch: 3 step: 52, loss is 1.7862696647644043 epoch: 3 step: 53, loss is 1.6133134365081787 epoch: 3 step: 54, loss is 1.695854902267456 epoch: 3 step: 55, loss is 1.7122721672058105 epoch: 3 step: 56, loss is 1.6208970546722412 epoch: 3 step: 57, loss is 1.7131991386413574 epoch: 3 step: 58, loss is 1.7421503067016602 epoch: 3 step: 59, loss is 1.7451450824737549 epoch: 3 step: 60, loss is 1.7532395124435425 epoch: 3 step: 61, loss is 1.6705751419067383 epoch: 3 step: 62, loss is 1.7952446937561035 epoch: 3 step: 63, loss is 1.795323133468628 epoch: 3 step: 64, loss is 1.7619107961654663 epoch: 3 step: 65, loss is 1.824674367904663 epoch: 3 step: 66, loss is 1.6873490810394287 epoch: 3 step: 67, loss is 1.7082254886627197 epoch: 3 step: 68, loss is 1.713441252708435 epoch: 3 step: 69, loss is 1.733770489692688 epoch: 3 step: 70, loss is 1.6999213695526123 epoch: 3 step: 71, loss is 1.653806209564209 epoch: 3 step: 72, loss is 1.7012852430343628 epoch: 3 step: 73, loss is 1.624730110168457 epoch: 3 step: 74, loss is 1.6058542728424072 epoch: 3 step: 75, loss is 1.7592147588729858 epoch: 3 step: 76, loss is 1.7891876697540283 epoch: 3 step: 77, loss is 1.6831679344177246 epoch: 3 step: 78, loss is 1.5972199440002441 epoch: 3 step: 79, loss is 1.7847541570663452 epoch: 3 step: 80, loss is 1.5809112787246704 epoch: 3 step: 81, loss is 1.8128926753997803 epoch: 3 step: 82, loss is 1.5927855968475342 epoch: 3 step: 83, loss is 1.7312583923339844 epoch: 3 step: 84, loss is 1.753067970275879 epoch: 3 step: 85, loss is 1.738058090209961 epoch: 3 step: 86, loss is 1.7906383275985718 epoch: 3 step: 87, loss is 1.8381223678588867 epoch: 3 step: 88, loss is 1.7381422519683838 epoch: 3 step: 89, loss is 1.7485058307647705 epoch: 3 step: 90, loss is 1.7682496309280396 epoch: 3 step: 91, loss is 1.7035850286483765 epoch: 3 step: 92, loss is 1.692620038986206 epoch: 3 step: 93, loss is 1.6465071439743042 epoch: 3 step: 94, loss is 1.6246577501296997 epoch: 3 step: 95, loss is 1.6367791891098022 epoch: 3 step: 96, loss is 1.958984375 epoch: 3 step: 97, loss is 1.6318498849868774 epoch: 3 step: 98, loss is 1.5361195802688599 epoch: 3 step: 99, loss is 1.6217405796051025 epoch: 3 step: 100, loss is 1.602027177810669 epoch: 3 step: 101, loss is 1.5985596179962158 epoch: 3 step: 102, loss is 1.6872880458831787 epoch: 3 step: 103, loss is 1.7511180639266968 epoch: 3 step: 104, loss is 1.8850643634796143 epoch: 3 step: 105, loss is 1.7996480464935303 epoch: 3 step: 106, loss is 1.6875202655792236 epoch: 3 step: 107, loss is 1.713998794555664 epoch: 3 step: 108, loss is 1.5832546949386597 epoch: 3 step: 109, loss is 1.6304515600204468 epoch: 3 step: 110, loss is 1.7660911083221436 epoch: 3 step: 111, loss is 1.7160937786102295 epoch: 3 step: 112, loss is 1.5161769390106201 epoch: 3 step: 113, loss is 1.6995164155960083 epoch: 3 step: 114, loss is 1.8551595211029053 epoch: 3 step: 115, loss is 1.7827998399734497 epoch: 3 step: 116, loss is 1.6238816976547241 epoch: 3 step: 117, loss is 1.8509113788604736 epoch: 3 step: 118, loss is 1.7182554006576538 epoch: 3 step: 119, loss is 1.7552552223205566 epoch: 3 step: 121, loss is 1.661957025527954 epoch: 3 step: 122, loss is 1.6754220724105835 epoch: 3 step: 123, loss is 1.7662798166275024 epoch: 3 step: 124, loss is 1.7356739044189453 epoch: 3 step: 125, loss is 1.8035058975219727 epoch: 3 step: 126, loss is 1.692866563796997 epoch: 3 step: 127, loss is 1.7170971632003784 epoch: 3 step: 128, loss is 1.6585720777511597 epoch: 3 step: 129, loss is 1.72513747215271 epoch: 3 step: 130, loss is 1.7004284858703613 epoch: 3 step: 131, loss is 1.676867961883545 epoch: 3 step: 132, loss is 1.6530230045318604 epoch: 3 step: 133, loss is 1.7507742643356323 epoch: 3 step: 134, loss is 1.7966041564941406 epoch: 3 step: 135, loss is 1.7693893909454346 epoch: 3 step: 136, loss is 1.6658430099487305 epoch: 3 step: 137, loss is 1.6244542598724365 epoch: 3 step: 138, loss is 1.6259570121765137 epoch: 3 step: 139, loss is 1.6822419166564941 epoch: 3 step: 140, loss is 1.850181221961975 epoch: 3 step: 141, loss is 1.7429217100143433 epoch: 3 step: 142, loss is 1.69063138961792 epoch: 3 step: 143, loss is 1.7009793519973755 epoch: 3 step: 144, loss is 1.7540186643600464 epoch: 3 step: 145, loss is 1.7211129665374756 epoch: 3 step: 146, loss is 1.7596542835235596 epoch: 3 step: 147, loss is 1.621204137802124 epoch: 3 step: 148, loss is 1.6826605796813965 epoch: 3 step: 149, loss is 1.7301777601242065 epoch: 3 step: 150, loss is 1.7946531772613525 epoch: 3 step: 151, loss is 1.5984761714935303 epoch: 3 step: 152, loss is 1.6736040115356445 epoch: 3 step: 153, loss is 1.6323038339614868 epoch: 3 step: 154, loss is 1.6896190643310547 epoch: 3 step: 155, loss is 1.722955346107483 epoch: 3 step: 156, loss is 1.6411956548690796 epoch: 3 step: 157, loss is 1.7016927003860474 epoch: 3 step: 158, loss is 1.7301901578903198 epoch: 3 step: 159, loss is 1.7329767942428589 epoch: 3 step: 160, loss is 1.6560049057006836 epoch: 3 step: 161, loss is 1.7290512323379517 epoch: 3 step: 162, loss is 1.6875516176223755 epoch: 3 step: 163, loss is 1.6642581224441528 epoch: 3 step: 164, loss is 1.8076648712158203 epoch: 3 step: 165, loss is 1.7624316215515137 epoch: 3 step: 166, loss is 1.76237952709198 epoch: 3 step: 167, loss is 1.6950775384902954 epoch: 3 step: 168, loss is 1.620380163192749 epoch: 3 step: 169, loss is 1.8368569612503052 epoch: 3 step: 170, loss is 1.6234006881713867 epoch: 3 step: 171, loss is 1.9272172451019287 epoch: 3 step: 172, loss is 1.7308259010314941 epoch: 3 step: 173, loss is 1.7092185020446777 epoch: 3 step: 174, loss is 1.677390217781067 epoch: 3 step: 175, loss is 1.6899553537368774 epoch: 3 step: 176, loss is 1.7895318269729614 epoch: 3 step: 177, loss is 1.6374738216400146 epoch: 3 step: 178, loss is 1.6670913696289062 epoch: 3 step: 179, loss is 1.7139530181884766 epoch: 3 step: 180, loss is 1.7262768745422363 epoch: 3 step: 181, loss is 1.7273998260498047 epoch: 3 step: 182, loss is 1.6726765632629395 epoch: 3 step: 183, loss is 1.828143835067749 epoch: 3 step: 184, loss is 1.6950719356536865 epoch: 3 step: 185, loss is 1.6884915828704834 epoch: 3 step: 186, loss is 1.6581523418426514 epoch: 3 step: 187, loss is 1.6636587381362915 epoch: 3 step: 188, loss is 1.7778453826904297 epoch: 3 step: 189, loss is 1.6951615810394287 epoch: 3 step: 190, loss is 1.7078237533569336 epoch: 3 step: 191, loss is 1.7957340478897095 epoch: 3 step: 192, loss is 1.5808676481246948 epoch: 3 step: 193, loss is 1.7798625230789185 epoch: 3 step: 194, loss is 1.711366057395935 epoch: 3 step: 195, loss is 1.6328104734420776 epoch: 3 step: 196, loss is 1.673056721687317 epoch: 3 step: 197, loss is 1.8323453664779663 epoch: 3 step: 198, loss is 1.7388792037963867 epoch: 3 step: 199, loss is 1.6818656921386719 epoch: 3 step: 200, loss is 1.7251689434051514 epoch: 3 step: 201, loss is 1.7291287183761597 epoch: 3 step: 202, loss is 1.7162761688232422 epoch: 3 step: 203, loss is 1.7972345352172852 epoch: 3 step: 204, loss is 1.685490608215332 epoch: 3 step: 205, loss is 1.8276681900024414 epoch: 3 step: 206, loss is 1.7020764350891113 epoch: 3 step: 207, loss is 1.6709315776824951 epoch: 3 step: 208, loss is 1.6524537801742554 epoch: 3 step: 209, loss is 1.6768217086791992 epoch: 3 step: 210, loss is 1.7436610460281372 epoch: 3 step: 211, loss is 1.733081579208374 epoch: 3 step: 212, loss is 1.6846342086791992 epoch: 3 step: 213, loss is 1.6859984397888184 epoch: 3 step: 214, loss is 1.8113853931427002 epoch: 3 step: 215, loss is 1.6925203800201416 epoch: 3 step: 216, loss is 1.6171596050262451 epoch: 3 step: 217, loss is 1.6766417026519775 epoch: 3 step: 218, loss is 1.7185215950012207 epoch: 3 step: 219, loss is 1.5875868797302246 epoch: 3 step: 220, loss is 1.7078516483306885 epoch: 3 step: 221, loss is 1.6866421699523926 epoch: 3 step: 222, loss is 1.7786364555358887 epoch: 3 step: 223, loss is 1.6472911834716797 epoch: 3 step: 224, loss is 1.6140011548995972 epoch: 3 step: 225, loss is 1.6925756931304932 epoch: 3 step: 226, loss is 1.6128917932510376 epoch: 3 step: 227, loss is 1.814792275428772 epoch: 3 step: 228, loss is 1.6881840229034424 epoch: 3 step: 229, loss is 1.7506282329559326 epoch: 3 step: 230, loss is 1.6750633716583252 epoch: 3 step: 231, loss is 1.652660608291626 epoch: 3 step: 232, loss is 1.6869404315948486 epoch: 3 step: 233, loss is 1.6548941135406494 epoch: 3 step: 234, loss is 1.7189624309539795 epoch: 3 step: 235, loss is 1.6367188692092896 epoch: 3 step: 236, loss is 1.8399276733398438 epoch: 3 step: 237, loss is 1.6386637687683105 epoch: 3 step: 238, loss is 1.6106817722320557 epoch: 3 step: 239, loss is 1.5091688632965088 epoch: 3 step: 240, loss is 1.7534894943237305 epoch: 3 step: 241, loss is 1.7333624362945557 epoch: 3 step: 242, loss is 1.8093721866607666 epoch: 3 step: 243, loss is 1.7643702030181885 epoch: 3 step: 244, loss is 1.801408052444458 epoch: 3 step: 245, loss is 1.7226362228393555 epoch: 3 step: 246, loss is 1.681382656097412 epoch: 3 step: 247, loss is 1.7535449266433716 epoch: 3 step: 248, loss is 1.7454997301101685 epoch: 3 step: 249, loss is 1.625917911529541 epoch: 3 step: 250, loss is 1.6731133460998535 epoch: 3 step: 251, loss is 1.7583502531051636 epoch: 3 step: 252, loss is 1.670466423034668 epoch: 3 step: 253, loss is 1.7863614559173584 epoch: 3 step: 254, loss is 1.642508625984192 epoch: 3 step: 255, loss is 1.7797843217849731 epoch: 3 step: 256, loss is 1.7379049062728882 epoch: 3 step: 257, loss is 1.7443997859954834 epoch: 3 step: 258, loss is 1.656363606452942 epoch: 3 step: 259, loss is 1.6958774328231812 epoch: 3 step: 260, loss is 1.645730972290039 epoch: 3 step: 261, loss is 1.7251406908035278 epoch: 3 step: 262, loss is 1.578377366065979 epoch: 3 step: 263, loss is 1.6736092567443848 epoch: 3 step: 264, loss is 1.77023184299469 epoch: 3 step: 265, loss is 1.7185709476470947 epoch: 3 step: 266, loss is 1.5423574447631836 epoch: 3 step: 267, loss is 1.7092128992080688 epoch: 3 step: 268, loss is 1.6410908699035645 epoch: 3 step: 269, loss is 1.6624755859375 epoch: 3 step: 270, loss is 1.5723826885223389 epoch: 3 step: 271, loss is 1.6907060146331787 epoch: 3 step: 272, loss is 1.6497552394866943 epoch: 3 step: 273, loss is 1.6850025653839111 epoch: 3 step: 274, loss is 1.6133666038513184 epoch: 3 step: 275, loss is 1.6401327848434448 epoch: 3 step: 276, loss is 1.5669808387756348 epoch: 3 step: 277, loss is 1.6670163869857788 epoch: 3 step: 278, loss is 1.5849509239196777 epoch: 3 step: 279, loss is 1.6250602006912231 epoch: 3 step: 280, loss is 1.6068509817123413 epoch: 3 step: 281, loss is 1.687977910041809 epoch: 3 step: 282, loss is 1.7549883127212524 epoch: 3 step: 283, loss is 1.601057767868042 epoch: 3 step: 284, loss is 1.5960354804992676 epoch: 3 step: 285, loss is 1.6581069231033325 epoch: 3 step: 286, loss is 1.620342493057251 epoch: 3 step: 287, loss is 1.5567959547042847 epoch: 3 step: 288, loss is 1.6604020595550537 epoch: 3 step: 289, loss is 1.7468873262405396 epoch: 3 step: 290, loss is 1.6225519180297852 epoch: 3 step: 291, loss is 1.6740832328796387 epoch: 3 step: 292, loss is 1.6024932861328125 epoch: 3 step: 293, loss is 1.7326544523239136 epoch: 3 step: 294, loss is 1.7520151138305664 epoch: 3 step: 295, loss is 1.7345353364944458 epoch: 3 step: 296, loss is 1.658333659172058 epoch: 3 step: 297, loss is 1.6240086555480957 epoch: 3 step: 298, loss is 1.7309271097183228 epoch: 3 step: 299, loss is 1.6567069292068481 epoch: 3 step: 300, loss is 1.6143699884414673 epoch: 3 step: 301, loss is 1.6671743392944336 epoch: 3 step: 302, loss is 1.61781907081604 epoch: 3 step: 303, loss is 1.7606620788574219 epoch: 3 step: 304, loss is 1.6801334619522095 epoch: 3 step: 305, loss is 1.607485055923462 epoch: 3 step: 306, loss is 1.63124680519104 epoch: 3 step: 307, loss is 1.6822651624679565 epoch: 3 step: 308, loss is 1.6892354488372803 epoch: 3 step: 309, loss is 1.5519628524780273 epoch: 3 step: 310, loss is 1.5416107177734375 epoch: 3 step: 311, loss is 1.7223525047302246 epoch: 3 step: 312, loss is 1.6520583629608154 epoch: 3 step: 313, loss is 1.7451046705245972 epoch: 3 step: 314, loss is 1.7313584089279175 epoch: 3 step: 315, loss is 1.5561679601669312 epoch: 3 step: 316, loss is 1.5857634544372559 epoch: 3 step: 317, loss is 1.7505254745483398 epoch: 3 step: 318, loss is 1.5984164476394653 epoch: 3 step: 319, loss is 1.7173864841461182 epoch: 3 step: 320, loss is 1.657182216644287 epoch: 3 step: 321, loss is 1.5848265886306763 epoch: 3 step: 322, loss is 1.754823088645935 epoch: 3 step: 323, loss is 1.7772974967956543 epoch: 3 step: 324, loss is 1.6578189134597778 epoch: 3 step: 325, loss is 1.6364493370056152 epoch: 3 step: 326, loss is 1.717586636543274 epoch: 3 step: 327, loss is 1.676952838897705 epoch: 3 step: 328, loss is 1.7036805152893066 epoch: 3 step: 329, loss is 1.7681641578674316 epoch: 3 step: 330, loss is 1.6032052040100098 epoch: 3 step: 331, loss is 1.629135012626648 epoch: 3 step: 332, loss is 1.6627397537231445 epoch: 3 step: 333, loss is 1.589951992034912 epoch: 3 step: 334, loss is 1.6370927095413208 epoch: 3 step: 335, loss is 1.8232088088989258 epoch: 3 step: 336, loss is 1.635468602180481 epoch: 3 step: 337, loss is 1.6314421892166138 epoch: 3 step: 338, loss is 1.7062299251556396 epoch: 3 step: 339, loss is 1.6799325942993164 epoch: 3 step: 340, loss is 1.7260220050811768 epoch: 3 step: 341, loss is 1.7736895084381104 epoch: 3 step: 342, loss is 1.7677066326141357 epoch: 3 step: 343, loss is 1.62371826171875 epoch: 3 step: 344, loss is 1.8004579544067383 epoch: 3 step: 345, loss is 1.6883655786514282 epoch: 3 step: 346, loss is 1.680694341659546 epoch: 3 step: 347, loss is 1.6327433586120605 epoch: 3 step: 348, loss is 1.7157868146896362 epoch: 3 step: 349, loss is 1.553451657295227 epoch: 3 step: 350, loss is 1.5815865993499756 epoch: 3 step: 351, loss is 1.6608421802520752 epoch: 3 step: 352, loss is 1.6624281406402588 epoch: 3 step: 353, loss is 1.7508538961410522 epoch: 3 step: 354, loss is 1.6291323900222778 epoch: 3 step: 355, loss is 1.549778938293457 epoch: 3 step: 356, loss is 1.6602725982666016 epoch: 3 step: 357, loss is 1.6622283458709717 epoch: 3 step: 358, loss is 1.6748517751693726 epoch: 3 step: 359, loss is 1.628075122833252 epoch: 3 step: 360, loss is 1.6040685176849365 epoch: 3 step: 361, loss is 1.6652014255523682 epoch: 3 step: 362, loss is 1.6706931591033936 epoch: 3 step: 363, loss is 1.763084888458252 epoch: 3 step: 364, loss is 1.5816519260406494 epoch: 3 step: 365, loss is 1.6744717359542847 epoch: 3 step: 366, loss is 1.6755478382110596 epoch: 3 step: 367, loss is 1.6712672710418701 epoch: 3 step: 368, loss is 1.644205927848816 epoch: 3 step: 369, loss is 1.8095952272415161 epoch: 3 step: 370, loss is 1.700730800628662 epoch: 3 step: 371, loss is 1.6767041683197021 epoch: 3 step: 372, loss is 1.7047760486602783 epoch: 3 step: 373, loss is 1.6957398653030396 epoch: 3 step: 374, loss is 1.724259853363037 epoch: 3 step: 375, loss is 1.719929814338684 epoch: 3 step: 376, loss is 1.7747626304626465 epoch: 3 step: 377, loss is 1.663613200187683 epoch: 3 step: 378, loss is 1.6453518867492676 epoch: 3 step: 379, loss is 1.7729018926620483 epoch: 3 step: 380, loss is 1.6704776287078857 epoch: 3 step: 381, loss is 1.73311448097229 epoch: 3 step: 382, loss is 1.6858258247375488 epoch: 3 step: 383, loss is 1.7387051582336426 epoch: 3 step: 384, loss is 1.6899727582931519 epoch: 3 step: 385, loss is 1.668046474456787 epoch: 3 step: 386, loss is 1.744113564491272 epoch: 3 step: 387, loss is 1.7295618057250977 epoch: 3 step: 388, loss is 1.6718225479125977 epoch: 3 step: 389, loss is 1.6799012422561646 epoch: 3 step: 390, loss is 1.6646846532821655 Train epoch time: 137277.460 ms, per step time: 351.993 ms epoch: 4 step: 1, loss is 1.7665646076202393 epoch: 4 step: 2, loss is 1.7729929685592651 epoch: 4 step: 3, loss is 1.6470832824707031 epoch: 4 step: 4, loss is 1.6297564506530762 epoch: 4 step: 5, loss is 1.8630563020706177 epoch: 4 step: 6, loss is 1.602109432220459 epoch: 4 step: 7, loss is 1.6661564111709595 epoch: 4 step: 8, loss is 1.6210423707962036 epoch: 4 step: 9, loss is 1.718315601348877 epoch: 4 step: 10, loss is 1.6194649934768677 epoch: 4 step: 11, loss is 1.6551474332809448 epoch: 4 step: 12, loss is 1.587424874305725 epoch: 4 step: 13, loss is 1.7578119039535522 epoch: 4 step: 14, loss is 1.5686590671539307 epoch: 4 step: 15, loss is 1.6391873359680176 epoch: 4 step: 16, loss is 1.6676154136657715 epoch: 4 step: 17, loss is 1.57009756565094 epoch: 4 step: 18, loss is 1.5941141843795776 epoch: 4 step: 19, loss is 1.600658655166626 epoch: 4 step: 20, loss is 1.739237666130066 epoch: 4 step: 21, loss is 1.731091856956482 epoch: 4 step: 22, loss is 1.6422715187072754 epoch: 4 step: 23, loss is 1.707399606704712 epoch: 4 step: 24, loss is 1.611019492149353 epoch: 4 step: 25, loss is 1.6092603206634521 epoch: 4 step: 26, loss is 1.7111058235168457 epoch: 4 step: 27, loss is 1.6297054290771484 epoch: 4 step: 28, loss is 1.5696802139282227 epoch: 4 step: 29, loss is 1.6419718265533447 epoch: 4 step: 30, loss is 1.6743837594985962 epoch: 4 step: 31, loss is 1.582382082939148 epoch: 4 step: 32, loss is 1.577364444732666 epoch: 4 step: 33, loss is 1.6486183404922485 epoch: 4 step: 34, loss is 1.5592169761657715 epoch: 4 step: 35, loss is 1.7035810947418213 epoch: 4 step: 36, loss is 1.6690112352371216 epoch: 4 step: 37, loss is 1.7288247346878052 epoch: 4 step: 38, loss is 1.6016851663589478 epoch: 4 step: 39, loss is 1.5863162279129028 epoch: 4 step: 40, loss is 1.5745949745178223 epoch: 4 step: 41, loss is 1.7239658832550049 epoch: 4 step: 42, loss is 1.6004254817962646 epoch: 4 step: 43, loss is 1.6561836004257202 epoch: 4 step: 44, loss is 1.5571510791778564 epoch: 4 step: 45, loss is 1.674217700958252 epoch: 4 step: 46, loss is 1.6153512001037598 epoch: 4 step: 47, loss is 1.6411802768707275 epoch: 4 step: 48, loss is 1.7577253580093384 epoch: 4 step: 49, loss is 1.5709835290908813 epoch: 4 step: 50, loss is 1.6777337789535522 epoch: 4 step: 51, loss is 1.5701429843902588 epoch: 4 step: 52, loss is 1.6471360921859741 epoch: 4 step: 53, loss is 1.6432157754898071 epoch: 4 step: 54, loss is 1.5561511516571045 epoch: 4 step: 55, loss is 1.526894211769104 epoch: 4 step: 56, loss is 1.602760672569275 epoch: 4 step: 57, loss is 1.6962835788726807 epoch: 4 step: 58, loss is 1.5803269147872925 epoch: 4 step: 59, loss is 1.6841492652893066 epoch: 4 step: 60, loss is 1.5673837661743164 epoch: 4 step: 61, loss is 1.5781997442245483 epoch: 4 step: 62, loss is 1.7395633459091187 epoch: 4 step: 63, loss is 1.7205133438110352 epoch: 4 step: 64, loss is 1.6280759572982788 epoch: 4 step: 65, loss is 1.5832451581954956 epoch: 4 step: 66, loss is 1.6047499179840088 epoch: 4 step: 67, loss is 1.6179279088974 epoch: 4 step: 68, loss is 1.6697516441345215 epoch: 4 step: 69, loss is 1.758061408996582 epoch: 4 step: 70, loss is 1.6961665153503418 epoch: 4 step: 71, loss is 1.6344490051269531 epoch: 4 step: 72, loss is 1.6972894668579102 epoch: 4 step: 73, loss is 1.5842139720916748 epoch: 4 step: 74, loss is 1.689929485321045 epoch: 4 step: 75, loss is 1.7666369676589966 epoch: 4 step: 76, loss is 1.7107584476470947 epoch: 4 step: 77, loss is 1.6161322593688965 epoch: 4 step: 78, loss is 1.6704270839691162 epoch: 4 step: 79, loss is 1.8084357976913452 epoch: 4 step: 80, loss is 1.624877691268921 epoch: 4 step: 81, loss is 1.792837142944336 epoch: 4 step: 82, loss is 1.6415560245513916 epoch: 4 step: 83, loss is 1.8001680374145508 epoch: 4 step: 84, loss is 1.7114930152893066 epoch: 4 step: 85, loss is 1.6641674041748047 epoch: 4 step: 86, loss is 1.7693594694137573 epoch: 4 step: 87, loss is 1.7237135171890259 epoch: 4 step: 88, loss is 1.6091974973678589 epoch: 4 step: 89, loss is 1.5529742240905762 epoch: 4 step: 90, loss is 1.6230971813201904 epoch: 4 step: 91, loss is 1.6457107067108154 epoch: 4 step: 92, loss is 1.756361722946167 epoch: 4 step: 93, loss is 1.627695918083191 epoch: 4 step: 94, loss is 1.6984617710113525 epoch: 4 step: 95, loss is 1.7225611209869385 epoch: 4 step: 96, loss is 1.8529340028762817 epoch: 4 step: 97, loss is 1.6294504404067993 epoch: 4 step: 98, loss is 1.6180821657180786 epoch: 4 step: 99, loss is 1.5178242921829224 epoch: 4 step: 100, loss is 1.6345207691192627 epoch: 4 step: 101, loss is 1.6975094079971313 epoch: 4 step: 102, loss is 1.6876051425933838 epoch: 4 step: 103, loss is 1.5193448066711426 epoch: 4 step: 104, loss is 1.5847586393356323 epoch: 4 step: 105, loss is 1.7236822843551636 epoch: 4 step: 106, loss is 1.6124868392944336 epoch: 4 step: 107, loss is 1.7172006368637085 epoch: 4 step: 108, loss is 1.6853677034378052 epoch: 4 step: 109, loss is 1.6522276401519775 epoch: 4 step: 110, loss is 1.570724368095398 epoch: 4 step: 111, loss is 1.6803762912750244 epoch: 4 step: 112, loss is 1.6421411037445068 epoch: 4 step: 113, loss is 1.816841721534729 epoch: 4 step: 114, loss is 1.685617208480835 epoch: 4 step: 115, loss is 1.6247706413269043 epoch: 4 step: 116, loss is 1.629885196685791 epoch: 4 step: 117, loss is 1.6980822086334229 epoch: 4 step: 118, loss is 1.7601029872894287 epoch: 4 step: 119, loss is 1.7125773429870605 epoch: 4 step: 120, loss is 1.7811102867126465 epoch: 4 step: 121, loss is 1.6890501976013184 epoch: 4 step: 122, loss is 1.5726172924041748 epoch: 4 step: 123, loss is 1.5811424255371094 epoch: 4 step: 124, loss is 1.6147644519805908 epoch: 4 step: 125, loss is 1.7560322284698486 epoch: 4 step: 126, loss is 1.6851052045822144 epoch: 4 step: 127, loss is 1.6187108755111694 epoch: 4 step: 128, loss is 1.6883559226989746 epoch: 4 step: 129, loss is 1.7009637355804443 epoch: 4 step: 130, loss is 1.6632158756256104 epoch: 4 step: 131, loss is 1.620411992073059 epoch: 4 step: 132, loss is 1.5214558839797974 epoch: 4 step: 133, loss is 1.661008358001709 epoch: 4 step: 134, loss is 1.8005833625793457 epoch: 4 step: 135, loss is 1.7580019235610962 epoch: 4 step: 136, loss is 1.6286892890930176 epoch: 4 step: 137, loss is 1.588380217552185 epoch: 4 step: 138, loss is 1.7015607357025146 epoch: 4 step: 139, loss is 1.6343886852264404 epoch: 4 step: 140, loss is 1.766648530960083 epoch: 4 step: 141, loss is 1.6016619205474854 epoch: 4 step: 142, loss is 1.6623468399047852 epoch: 4 step: 143, loss is 1.6932034492492676 epoch: 4 step: 144, loss is 1.5762094259262085 epoch: 4 step: 145, loss is 1.648411750793457 epoch: 4 step: 146, loss is 1.5969412326812744 epoch: 4 step: 147, loss is 1.6773122549057007 epoch: 4 step: 148, loss is 1.581958293914795 epoch: 4 step: 149, loss is 1.7402784824371338 epoch: 4 step: 150, loss is 1.7538864612579346 epoch: 4 step: 151, loss is 1.739711046218872 epoch: 4 step: 152, loss is 1.717599630355835 epoch: 4 step: 153, loss is 1.6932005882263184 epoch: 4 step: 154, loss is 1.5731552839279175 epoch: 4 step: 155, loss is 1.6623612642288208 epoch: 4 step: 156, loss is 1.6327764987945557 epoch: 4 step: 157, loss is 1.791067123413086 epoch: 4 step: 158, loss is 1.649024248123169 epoch: 4 step: 159, loss is 1.6950156688690186 epoch: 4 step: 160, loss is 1.693366289138794 epoch: 4 step: 161, loss is 1.7357239723205566 epoch: 4 step: 162, loss is 1.6453094482421875 epoch: 4 step: 163, loss is 1.7228856086730957 epoch: 4 step: 164, loss is 1.737823724746704 epoch: 4 step: 165, loss is 1.6535261869430542 epoch: 4 step: 166, loss is 1.5822927951812744 epoch: 4 step: 167, loss is 1.7670165300369263 epoch: 4 step: 168, loss is 1.6337506771087646 epoch: 4 step: 169, loss is 1.626481294631958 epoch: 4 step: 170, loss is 1.614647388458252 epoch: 4 step: 171, loss is 1.5730955600738525 epoch: 4 step: 172, loss is 1.6641924381256104 epoch: 4 step: 173, loss is 1.7176563739776611 epoch: 4 step: 174, loss is 1.7853620052337646 epoch: 4 step: 175, loss is 1.6836720705032349 epoch: 4 step: 176, loss is 1.626739740371704 epoch: 4 step: 177, loss is 1.6609660387039185 epoch: 4 step: 178, loss is 1.658376693725586 epoch: 4 step: 179, loss is 1.6433024406433105 epoch: 4 step: 180, loss is 1.683169960975647 epoch: 4 step: 181, loss is 1.731007695198059 epoch: 4 step: 182, loss is 1.5968960523605347 epoch: 4 step: 183, loss is 1.7125822305679321 epoch: 4 step: 184, loss is 1.7482081651687622 epoch: 4 step: 185, loss is 1.4856934547424316 epoch: 4 step: 186, loss is 1.693978190422058 epoch: 4 step: 187, loss is 1.7054548263549805 epoch: 4 step: 188, loss is 1.617133617401123 epoch: 4 step: 189, loss is 1.7198596000671387 epoch: 4 step: 190, loss is 1.5512173175811768 epoch: 4 step: 191, loss is 1.6030992269515991 epoch: 4 step: 192, loss is 1.634267807006836 epoch: 4 step: 193, loss is 1.7750589847564697 epoch: 4 step: 194, loss is 1.6731497049331665 epoch: 4 step: 195, loss is 1.654366374015808 epoch: 4 step: 196, loss is 1.611971139907837 epoch: 4 step: 197, loss is 1.535520076751709 epoch: 4 step: 198, loss is 1.710953950881958 epoch: 4 step: 199, loss is 1.7189371585845947 epoch: 4 step: 200, loss is 1.6510109901428223 epoch: 4 step: 201, loss is 1.6933293342590332 epoch: 4 step: 202, loss is 1.6117509603500366 epoch: 4 step: 203, loss is 1.696395993232727 epoch: 4 step: 204, loss is 1.708708643913269 epoch: 4 step: 205, loss is 1.7166845798492432 epoch: 4 step: 206, loss is 1.549208164215088 epoch: 4 step: 207, loss is 1.646860122680664 epoch: 4 step: 208, loss is 1.761258602142334 epoch: 4 step: 209, loss is 1.632847785949707 epoch: 4 step: 210, loss is 1.6269383430480957 epoch: 4 step: 211, loss is 1.6960309743881226 epoch: 4 step: 212, loss is 1.692077875137329 epoch: 4 step: 213, loss is 1.7015795707702637 epoch: 4 step: 214, loss is 1.6602762937545776 epoch: 4 step: 215, loss is 1.5575273036956787 epoch: 4 step: 216, loss is 1.641848087310791 epoch: 4 step: 217, loss is 1.6609628200531006 epoch: 4 step: 218, loss is 1.647723913192749 epoch: 4 step: 219, loss is 1.5897362232208252 epoch: 4 step: 220, loss is 1.6362701654434204 epoch: 4 step: 221, loss is 1.4809000492095947 epoch: 4 step: 222, loss is 1.6989250183105469 epoch: 4 step: 223, loss is 1.684139609336853 epoch: 4 step: 224, loss is 1.604182243347168 epoch: 4 step: 225, loss is 1.597058892250061 epoch: 4 step: 226, loss is 1.5693527460098267 epoch: 4 step: 227, loss is 1.675795555114746 epoch: 4 step: 228, loss is 1.7298972606658936 epoch: 4 step: 229, loss is 1.5077394247055054 epoch: 4 step: 230, loss is 1.7128574848175049 epoch: 4 step: 231, loss is 1.6700948476791382 epoch: 4 step: 232, loss is 1.7240345478057861 epoch: 4 step: 233, loss is 1.6436165571212769 epoch: 4 step: 234, loss is 1.597033977508545 epoch: 4 step: 235, loss is 1.630689263343811 epoch: 4 step: 236, loss is 1.5650776624679565 epoch: 4 step: 237, loss is 1.5909273624420166 epoch: 4 step: 238, loss is 1.5114649534225464 epoch: 4 step: 239, loss is 1.4799362421035767 epoch: 4 step: 240, loss is 1.6256462335586548 epoch: 4 step: 241, loss is 1.6227055788040161 epoch: 4 step: 242, loss is 1.6424766778945923 epoch: 4 step: 243, loss is 1.6381595134735107 epoch: 4 step: 244, loss is 1.4983725547790527 epoch: 4 step: 245, loss is 1.6759635210037231 epoch: 4 step: 246, loss is 1.6533443927764893 epoch: 4 step: 247, loss is 1.6394306421279907 epoch: 4 step: 248, loss is 1.5157358646392822 epoch: 4 step: 249, loss is 1.794636845588684 epoch: 4 step: 250, loss is 1.543221354484558 epoch: 4 step: 251, loss is 1.7333502769470215 epoch: 4 step: 252, loss is 1.5999860763549805 epoch: 4 step: 253, loss is 1.7100112438201904 epoch: 4 step: 254, loss is 1.6709332466125488 epoch: 4 step: 255, loss is 1.734560489654541 epoch: 4 step: 256, loss is 1.6745027303695679 epoch: 4 step: 257, loss is 1.5218114852905273 epoch: 4 step: 258, loss is 1.6951849460601807 epoch: 4 step: 259, loss is 1.7123606204986572 epoch: 4 step: 260, loss is 1.587955117225647 epoch: 4 step: 261, loss is 1.6506975889205933 epoch: 4 step: 262, loss is 1.679764747619629 epoch: 4 step: 263, loss is 1.5851755142211914 epoch: 4 step: 264, loss is 1.531790018081665 epoch: 4 step: 265, loss is 1.660425066947937 epoch: 4 step: 266, loss is 1.5950064659118652 epoch: 4 step: 267, loss is 1.7657142877578735 epoch: 4 step: 268, loss is 1.6085398197174072 epoch: 4 step: 269, loss is 1.6423161029815674 epoch: 4 step: 270, loss is 1.661104440689087 epoch: 4 step: 271, loss is 1.6248400211334229 epoch: 4 step: 272, loss is 1.5915062427520752 epoch: 4 step: 273, loss is 1.61185884475708 epoch: 4 step: 274, loss is 1.5535991191864014 epoch: 4 step: 275, loss is 1.6014961004257202 epoch: 4 step: 276, loss is 1.4933654069900513 epoch: 4 step: 277, loss is 1.5550355911254883 epoch: 4 step: 278, loss is 1.679355263710022 epoch: 4 step: 279, loss is 1.5373117923736572 epoch: 4 step: 280, loss is 1.5168204307556152 epoch: 4 step: 281, loss is 1.5989758968353271 epoch: 4 step: 282, loss is 1.5473474264144897 epoch: 4 step: 283, loss is 1.6486226320266724 epoch: 4 step: 284, loss is 1.5247681140899658 epoch: 4 step: 285, loss is 1.4759743213653564 epoch: 4 step: 286, loss is 1.5603187084197998 epoch: 4 step: 287, loss is 1.5089948177337646 epoch: 4 step: 288, loss is 1.6216365098953247 epoch: 4 step: 289, loss is 1.6203341484069824 epoch: 4 step: 290, loss is 1.6797423362731934 epoch: 4 step: 291, loss is 1.5971567630767822 epoch: 4 step: 292, loss is 1.6036550998687744 epoch: 4 step: 293, loss is 1.475748062133789 epoch: 4 step: 294, loss is 1.652454137802124 epoch: 4 step: 295, loss is 1.646332025527954 epoch: 4 step: 296, loss is 1.6595573425292969 epoch: 4 step: 297, loss is 1.7773164510726929 epoch: 4 step: 298, loss is 1.487428903579712 epoch: 4 step: 299, loss is 1.7178772687911987 epoch: 4 step: 300, loss is 1.663989782333374 epoch: 4 step: 301, loss is 1.6825196743011475 epoch: 4 step: 302, loss is 1.5681579113006592 epoch: 4 step: 303, loss is 1.7053875923156738 epoch: 4 step: 304, loss is 1.634131669998169 epoch: 4 step: 305, loss is 1.5480968952178955 epoch: 4 step: 306, loss is 1.7233874797821045 epoch: 4 step: 307, loss is 1.5639179944992065 epoch: 4 step: 308, loss is 1.4927529096603394 epoch: 4 step: 309, loss is 1.7415945529937744 epoch: 4 step: 310, loss is 1.6337419748306274 epoch: 4 step: 311, loss is 1.568310260772705 epoch: 4 step: 312, loss is 1.566457748413086 epoch: 4 step: 313, loss is 1.7154525518417358 epoch: 4 step: 314, loss is 1.6517589092254639 epoch: 4 step: 315, loss is 1.6655876636505127 epoch: 4 step: 316, loss is 1.6702594757080078 epoch: 4 step: 317, loss is 1.651951789855957 epoch: 4 step: 318, loss is 1.6761770248413086 epoch: 4 step: 319, loss is 1.5988519191741943 epoch: 4 step: 320, loss is 1.588598370552063 epoch: 4 step: 321, loss is 1.6207096576690674 epoch: 4 step: 322, loss is 1.5871853828430176 epoch: 4 step: 323, loss is 1.6355260610580444 epoch: 4 step: 324, loss is 1.6514222621917725 epoch: 4 step: 325, loss is 1.6661778688430786 epoch: 4 step: 326, loss is 1.5839908123016357 epoch: 4 step: 327, loss is 1.5816900730133057 epoch: 4 step: 328, loss is 1.6220271587371826 epoch: 4 step: 329, loss is 1.5795402526855469 epoch: 4 step: 330, loss is 1.6900765895843506 epoch: 4 step: 331, loss is 1.62031888961792 epoch: 4 step: 332, loss is 1.609192132949829 epoch: 4 step: 333, loss is 1.602006435394287 epoch: 4 step: 334, loss is 1.4889105558395386 epoch: 4 step: 335, loss is 1.6649023294448853 epoch: 4 step: 336, loss is 1.5220891237258911 epoch: 4 step: 337, loss is 1.6232905387878418 epoch: 4 step: 338, loss is 1.636336088180542 epoch: 4 step: 339, loss is 1.5883915424346924 epoch: 4 step: 340, loss is 1.6407790184020996 epoch: 4 step: 341, loss is 1.641093373298645 epoch: 4 step: 342, loss is 1.7343354225158691 epoch: 4 step: 343, loss is 1.6026684045791626 epoch: 4 step: 344, loss is 1.6374675035476685 epoch: 4 step: 345, loss is 1.4993141889572144 epoch: 4 step: 346, loss is 1.479374885559082 epoch: 4 step: 347, loss is 1.615074872970581 epoch: 4 step: 348, loss is 1.50643789768219 epoch: 4 step: 349, loss is 1.5816335678100586 epoch: 4 step: 350, loss is 1.5549730062484741 epoch: 4 step: 351, loss is 1.6186494827270508 epoch: 4 step: 352, loss is 1.5993502140045166 epoch: 4 step: 353, loss is 1.5495643615722656 epoch: 4 step: 354, loss is 1.6571474075317383 epoch: 4 step: 355, loss is 1.560755968093872 epoch: 4 step: 356, loss is 1.5330283641815186 epoch: 4 step: 357, loss is 1.5570311546325684 epoch: 4 step: 358, loss is 1.6629343032836914 epoch: 4 step: 359, loss is 1.5750832557678223 epoch: 4 step: 360, loss is 1.6677916049957275 epoch: 4 step: 361, loss is 1.6378893852233887 epoch: 4 step: 362, loss is 1.6026842594146729 epoch: 4 step: 363, loss is 1.6248705387115479 epoch: 4 step: 364, loss is 1.5606310367584229 epoch: 4 step: 365, loss is 1.5833842754364014 epoch: 4 step: 366, loss is 1.6341984272003174 epoch: 4 step: 367, loss is 1.552811861038208 epoch: 4 step: 368, loss is 1.627872109413147 epoch: 4 step: 369, loss is 1.6115565299987793 epoch: 4 step: 370, loss is 1.635170578956604 epoch: 4 step: 371, loss is 1.5942012071609497 epoch: 4 step: 372, loss is 1.529130220413208 epoch: 4 step: 373, loss is 1.5580171346664429 epoch: 4 step: 374, loss is 1.5679078102111816 epoch: 4 step: 375, loss is 1.5632736682891846 epoch: 4 step: 376, loss is 1.6893508434295654 epoch: 4 step: 377, loss is 1.5821154117584229 epoch: 4 step: 378, loss is 1.5859253406524658 epoch: 4 step: 379, loss is 1.6288336515426636 epoch: 4 step: 380, loss is 1.6055165529251099 epoch: 4 step: 381, loss is 1.6159390211105347 epoch: 4 step: 382, loss is 1.7954552173614502 epoch: 4 step: 383, loss is 1.6712247133255005 epoch: 4 step: 384, loss is 1.490175724029541 epoch: 4 step: 385, loss is 1.6611671447753906 epoch: 4 step: 386, loss is 1.5845706462860107 epoch: 4 step: 387, loss is 1.5821824073791504 epoch: 4 step: 388, loss is 1.554640293121338 epoch: 4 step: 389, loss is 1.5333958864212036 epoch: 4 step: 390, loss is 1.5936779975891113 Train epoch time: 138425.337 ms, per step time: 354.937 ms epoch: 5 step: 1, loss is 1.5464458465576172 epoch: 5 step: 2, loss is 1.5490660667419434 epoch: 5 step: 3, loss is 1.642794132232666 epoch: 5 step: 4, loss is 1.5246491432189941 epoch: 5 step: 5, loss is 1.6389213800430298 epoch: 5 step: 6, loss is 1.6280150413513184 epoch: 5 step: 7, loss is 1.5669608116149902 epoch: 5 step: 8, loss is 1.5948519706726074 epoch: 5 step: 9, loss is 1.5713417530059814 epoch: 5 step: 10, loss is 1.5953696966171265 epoch: 5 step: 11, loss is 1.7376065254211426 epoch: 5 step: 12, loss is 1.4884297847747803 epoch: 5 step: 13, loss is 1.5369075536727905 epoch: 5 step: 14, loss is 1.5518091917037964 epoch: 5 step: 15, loss is 1.489527940750122 epoch: 5 step: 16, loss is 1.5873897075653076 epoch: 5 step: 17, loss is 1.4995498657226562 epoch: 5 step: 18, loss is 1.6806131601333618 epoch: 5 step: 19, loss is 1.685680627822876 epoch: 5 step: 20, loss is 1.5529975891113281 epoch: 5 step: 21, loss is 1.548069715499878 epoch: 5 step: 22, loss is 1.6290614604949951 epoch: 5 step: 23, loss is 1.6917892694473267 epoch: 5 step: 24, loss is 1.7500019073486328 epoch: 5 step: 25, loss is 1.6475093364715576 epoch: 5 step: 26, loss is 1.6011072397232056 epoch: 5 step: 27, loss is 1.5014625787734985 epoch: 5 step: 28, loss is 1.624285340309143 epoch: 5 step: 29, loss is 1.5557959079742432 epoch: 5 step: 30, loss is 1.6671175956726074 epoch: 5 step: 31, loss is 1.6444437503814697 epoch: 5 step: 32, loss is 1.71543288230896 epoch: 5 step: 33, loss is 1.707556962966919 epoch: 5 step: 34, loss is 1.5246912240982056 epoch: 5 step: 35, loss is 1.5803017616271973 epoch: 5 step: 36, loss is 1.6457101106643677 epoch: 5 step: 37, loss is 1.5279536247253418 epoch: 5 step: 38, loss is 1.6704199314117432 epoch: 5 step: 39, loss is 1.7229416370391846 epoch: 5 step: 40, loss is 1.7658991813659668 epoch: 5 step: 41, loss is 1.5633502006530762 epoch: 5 step: 42, loss is 1.53678560256958 epoch: 5 step: 43, loss is 1.5142439603805542 epoch: 5 step: 44, loss is 1.5646271705627441 epoch: 5 step: 45, loss is 1.6065638065338135 epoch: 5 step: 46, loss is 1.5175371170043945 epoch: 5 step: 47, loss is 1.6855463981628418 epoch: 5 step: 48, loss is 1.4915989637374878 epoch: 5 step: 49, loss is 1.5415297746658325 epoch: 5 step: 50, loss is 1.676391839981079 epoch: 5 step: 51, loss is 1.583856225013733 epoch: 5 step: 52, loss is 1.6200132369995117 epoch: 5 step: 53, loss is 1.5288734436035156 epoch: 5 step: 54, loss is 1.7616413831710815 epoch: 5 step: 55, loss is 1.5649387836456299 epoch: 5 step: 56, loss is 1.5781047344207764 epoch: 5 step: 57, loss is 1.5427920818328857 epoch: 5 step: 58, loss is 1.6305038928985596 epoch: 5 step: 59, loss is 1.5771982669830322 epoch: 5 step: 60, loss is 1.5448408126831055 epoch: 5 step: 61, loss is 1.649052381515503 epoch: 5 step: 62, loss is 1.5636181831359863 epoch: 5 step: 63, loss is 1.5163731575012207 epoch: 5 step: 64, loss is 1.6505489349365234 epoch: 5 step: 65, loss is 1.5638759136199951 epoch: 5 step: 66, loss is 1.4435665607452393 epoch: 5 step: 67, loss is 1.5043169260025024 epoch: 5 step: 68, loss is 1.5111082792282104 epoch: 5 step: 69, loss is 1.5772652626037598 epoch: 5 step: 70, loss is 1.5063726902008057 epoch: 5 step: 71, loss is 1.6912550926208496 epoch: 5 step: 72, loss is 1.5251529216766357 epoch: 5 step: 73, loss is 1.6467955112457275 epoch: 5 step: 74, loss is 1.5381712913513184 epoch: 5 step: 75, loss is 1.7241185903549194 epoch: 5 step: 76, loss is 1.6900105476379395 epoch: 5 step: 77, loss is 1.5301634073257446 epoch: 5 step: 78, loss is 1.6200151443481445 epoch: 5 step: 79, loss is 1.656822919845581 epoch: 5 step: 80, loss is 1.6287442445755005 epoch: 5 step: 81, loss is 1.6830562353134155 epoch: 5 step: 82, loss is 1.5854827165603638 epoch: 5 step: 83, loss is 1.5578904151916504 epoch: 5 step: 84, loss is 1.5953161716461182 epoch: 5 step: 85, loss is 1.5304856300354004 epoch: 5 step: 86, loss is 1.509493112564087 epoch: 5 step: 87, loss is 1.545473337173462 epoch: 5 step: 88, loss is 1.6200509071350098 epoch: 5 step: 89, loss is 1.5903046131134033 epoch: 5 step: 90, loss is 1.5930538177490234 epoch: 5 step: 91, loss is 1.6241464614868164 epoch: 5 step: 92, loss is 1.6602556705474854 epoch: 5 step: 93, loss is 1.6808658838272095 epoch: 5 step: 94, loss is 1.56581711769104 epoch: 5 step: 95, loss is 1.6218782663345337 epoch: 5 step: 96, loss is 1.569502592086792 epoch: 5 step: 97, loss is 1.5460721254348755 epoch: 5 step: 98, loss is 1.5093741416931152 epoch: 5 step: 99, loss is 1.583551049232483 epoch: 5 step: 100, loss is 1.6650238037109375 epoch: 5 step: 101, loss is 1.4839850664138794 epoch: 5 step: 102, loss is 1.5525320768356323 epoch: 5 step: 103, loss is 1.74044930934906 epoch: 5 step: 104, loss is 1.572702407836914 epoch: 5 step: 105, loss is 1.6207044124603271 epoch: 5 step: 106, loss is 1.6691240072250366 epoch: 5 step: 107, loss is 1.609930157661438 epoch: 5 step: 108, loss is 1.624471664428711 epoch: 5 step: 109, loss is 1.6149026155471802 epoch: 5 step: 110, loss is 1.5685765743255615 epoch: 5 step: 111, loss is 1.5273855924606323 epoch: 5 step: 112, loss is 1.4798717498779297 epoch: 5 step: 113, loss is 1.5261898040771484 epoch: 5 step: 114, loss is 1.6411833763122559 epoch: 5 step: 115, loss is 1.7388556003570557 epoch: 5 step: 116, loss is 1.585860252380371 epoch: 5 step: 117, loss is 1.5292375087738037 epoch: 5 step: 118, loss is 1.609330654144287 epoch: 5 step: 119, loss is 1.515244722366333 epoch: 5 step: 120, loss is 1.6614331007003784 epoch: 5 step: 121, loss is 1.6544468402862549 epoch: 5 step: 122, loss is 1.7791221141815186 epoch: 5 step: 123, loss is 1.6416209936141968 epoch: 5 step: 124, loss is 1.6372172832489014 epoch: 5 step: 125, loss is 1.5603387355804443 epoch: 5 step: 126, loss is 1.651618242263794 epoch: 5 step: 127, loss is 1.6001204252243042 epoch: 5 step: 128, loss is 1.6608588695526123 epoch: 5 step: 129, loss is 1.6286869049072266 epoch: 5 step: 130, loss is 1.6249475479125977 epoch: 5 step: 131, loss is 1.612933874130249 epoch: 5 step: 132, loss is 1.6135444641113281 epoch: 5 step: 133, loss is 1.5163543224334717 epoch: 5 step: 134, loss is 1.560084342956543 epoch: 5 step: 135, loss is 1.7146730422973633 epoch: 5 step: 136, loss is 1.5430347919464111 epoch: 5 step: 137, loss is 1.6204291582107544 epoch: 5 step: 138, loss is 1.5874114036560059 epoch: 5 step: 139, loss is 1.667879343032837 epoch: 5 step: 140, loss is 1.50150465965271 epoch: 5 step: 141, loss is 1.524300456047058 epoch: 5 step: 142, loss is 1.6768046617507935 epoch: 5 step: 143, loss is 1.6169962882995605 epoch: 5 step: 144, loss is 1.6757628917694092 epoch: 5 step: 145, loss is 1.7038618326187134 epoch: 5 step: 146, loss is 1.6276432275772095 epoch: 5 step: 147, loss is 1.6057173013687134 epoch: 5 step: 148, loss is 1.5628745555877686 epoch: 5 step: 149, loss is 1.5292811393737793 epoch: 5 step: 150, loss is 1.5407886505126953 epoch: 5 step: 151, loss is 1.7137854099273682 epoch: 5 step: 152, loss is 1.5135602951049805 epoch: 5 step: 153, loss is 1.5987777709960938 epoch: 5 step: 154, loss is 1.5855841636657715 epoch: 5 step: 155, loss is 1.5194423198699951 epoch: 5 step: 156, loss is 1.539477825164795 epoch: 5 step: 157, loss is 1.5126971006393433 epoch: 5 step: 158, loss is 1.654653787612915 epoch: 5 step: 159, loss is 1.6087019443511963 epoch: 5 step: 160, loss is 1.6396305561065674 epoch: 5 step: 161, loss is 1.561169981956482 epoch: 5 step: 162, loss is 1.749948263168335 epoch: 5 step: 163, loss is 1.687319278717041 epoch: 5 step: 164, loss is 1.4695658683776855 epoch: 5 step: 165, loss is 1.713590383529663 epoch: 5 step: 166, loss is 1.6356000900268555 epoch: 5 step: 167, loss is 1.6140506267547607 epoch: 5 step: 168, loss is 1.542904019355774 epoch: 5 step: 169, loss is 1.5961171388626099 epoch: 5 step: 170, loss is 1.6317425966262817 epoch: 5 step: 171, loss is 1.5642426013946533 epoch: 5 step: 172, loss is 1.4814541339874268 epoch: 5 step: 173, loss is 1.5676054954528809 epoch: 5 step: 174, loss is 1.562029480934143 epoch: 5 step: 175, loss is 1.6520495414733887 epoch: 5 step: 176, loss is 1.4933428764343262 epoch: 5 step: 177, loss is 1.4860643148422241 epoch: 5 step: 178, loss is 1.6030986309051514 epoch: 5 step: 179, loss is 1.647951364517212 epoch: 5 step: 180, loss is 1.566894769668579 epoch: 5 step: 181, loss is 1.5801382064819336 epoch: 5 step: 182, loss is 1.6369566917419434 epoch: 5 step: 183, loss is 1.5468864440917969 epoch: 5 step: 184, loss is 1.5584207773208618 epoch: 5 step: 185, loss is 1.4804672002792358 epoch: 5 step: 186, loss is 1.5537633895874023 epoch: 5 step: 187, loss is 1.6010252237319946 epoch: 5 step: 188, loss is 1.5608432292938232 epoch: 5 step: 189, loss is 1.7281334400177002 epoch: 5 step: 190, loss is 1.5919592380523682 epoch: 5 step: 191, loss is 1.733491063117981 epoch: 5 step: 192, loss is 1.435774326324463 epoch: 5 step: 193, loss is 1.5986652374267578 epoch: 5 step: 194, loss is 1.5278804302215576 epoch: 5 step: 195, loss is 1.6496506929397583 epoch: 5 step: 196, loss is 1.4763741493225098 epoch: 5 step: 197, loss is 1.6587899923324585 epoch: 5 step: 198, loss is 1.526326060295105 epoch: 5 step: 199, loss is 1.568572759628296 epoch: 5 step: 200, loss is 1.7952017784118652 epoch: 5 step: 201, loss is 1.6071856021881104 epoch: 5 step: 202, loss is 1.5374550819396973 epoch: 5 step: 203, loss is 1.598673939704895 epoch: 5 step: 204, loss is 1.598262071609497 epoch: 5 step: 205, loss is 1.467951774597168 epoch: 5 step: 206, loss is 1.6616650819778442 epoch: 5 step: 207, loss is 1.6752253770828247 epoch: 5 step: 208, loss is 1.6036208868026733 epoch: 5 step: 209, loss is 1.4383089542388916 epoch: 5 step: 210, loss is 1.5149781703948975 epoch: 5 step: 211, loss is 1.6035890579223633 epoch: 5 step: 212, loss is 1.47220778465271 epoch: 5 step: 213, loss is 1.6376032829284668 epoch: 5 step: 214, loss is 1.5957834720611572 epoch: 5 step: 215, loss is 1.5264352560043335 epoch: 5 step: 216, loss is 1.4593554735183716 epoch: 5 step: 217, loss is 1.4451793432235718 epoch: 5 step: 218, loss is 1.5605082511901855 epoch: 5 step: 219, loss is 1.5353803634643555 epoch: 5 step: 220, loss is 1.486664891242981 epoch: 5 step: 221, loss is 1.4957565069198608 epoch: 5 step: 222, loss is 1.4623057842254639 epoch: 5 step: 223, loss is 1.5755987167358398 epoch: 5 step: 224, loss is 1.6323907375335693 epoch: 5 step: 225, loss is 1.5898890495300293 epoch: 5 step: 226, loss is 1.6523762941360474 epoch: 5 step: 227, loss is 1.6369037628173828 epoch: 5 step: 228, loss is 1.6995673179626465 epoch: 5 step: 229, loss is 1.5883913040161133 epoch: 5 step: 230, loss is 1.5560717582702637 epoch: 5 step: 231, loss is 1.7499263286590576 epoch: 5 step: 232, loss is 1.4835195541381836 epoch: 5 step: 233, loss is 1.5076158046722412 epoch: 5 step: 234, loss is 1.6261305809020996 epoch: 5 step: 235, loss is 1.491349220275879 epoch: 5 step: 236, loss is 1.606314778327942 epoch: 5 step: 237, loss is 1.612988829612732 epoch: 5 step: 238, loss is 1.645527720451355 epoch: 5 step: 239, loss is 1.4435009956359863 epoch: 5 step: 240, loss is 1.5991226434707642 epoch: 5 step: 241, loss is 1.4744620323181152 epoch: 5 step: 242, loss is 1.5162442922592163 epoch: 5 step: 243, loss is 1.6369259357452393 epoch: 5 step: 244, loss is 1.5557739734649658 epoch: 5 step: 245, loss is 1.7018060684204102 epoch: 5 step: 246, loss is 1.6451897621154785 epoch: 5 step: 247, loss is 1.5741554498672485 epoch: 5 step: 248, loss is 1.5491695404052734 epoch: 5 step: 249, loss is 1.5999127626419067 epoch: 5 step: 250, loss is 1.6908071041107178 epoch: 5 step: 251, loss is 1.759283185005188 epoch: 5 step: 252, loss is 1.5661420822143555 epoch: 5 step: 253, loss is 1.6386096477508545 epoch: 5 step: 254, loss is 1.5429463386535645 epoch: 5 step: 255, loss is 1.527472972869873 epoch: 5 step: 256, loss is 1.6500145196914673 epoch: 5 step: 257, loss is 1.5596885681152344 epoch: 5 step: 258, loss is 1.6451014280319214 epoch: 5 step: 259, loss is 1.4805189371109009 epoch: 5 step: 260, loss is 1.6150541305541992 epoch: 5 step: 261, loss is 1.505074143409729 epoch: 5 step: 262, loss is 1.5360993146896362 epoch: 5 step: 263, loss is 1.6581367254257202 epoch: 5 step: 264, loss is 1.5800639390945435 epoch: 5 step: 265, loss is 1.562695026397705 epoch: 5 step: 266, loss is 1.5492441654205322 epoch: 5 step: 267, loss is 1.6822898387908936 epoch: 5 step: 268, loss is 1.6009395122528076 epoch: 5 step: 269, loss is 1.617034673690796 epoch: 5 step: 270, loss is 1.6110554933547974 epoch: 5 step: 271, loss is 1.700515627861023 epoch: 5 step: 272, loss is 1.5993274450302124 epoch: 5 step: 273, loss is 1.6145036220550537 epoch: 5 step: 274, loss is 1.5093674659729004 epoch: 5 step: 275, loss is 1.4977017641067505 epoch: 5 step: 276, loss is 1.4924139976501465 epoch: 5 step: 277, loss is 1.52211332321167 epoch: 5 step: 278, loss is 1.6417977809906006 epoch: 5 step: 279, loss is 1.551783800125122 epoch: 5 step: 280, loss is 1.4824382066726685 epoch: 5 step: 281, loss is 1.5753940343856812 epoch: 5 step: 282, loss is 1.6696312427520752 epoch: 5 step: 283, loss is 1.5151088237762451 epoch: 5 step: 284, loss is 1.5019285678863525 epoch: 5 step: 285, loss is 1.5295757055282593 epoch: 5 step: 286, loss is 1.5404324531555176 epoch: 5 step: 287, loss is 1.5012264251708984 epoch: 5 step: 288, loss is 1.6743615865707397 epoch: 5 step: 289, loss is 1.5306564569473267 epoch: 5 step: 290, loss is 1.5608266592025757 epoch: 5 step: 291, loss is 1.528624176979065 epoch: 5 step: 292, loss is 1.569962501525879 epoch: 5 step: 293, loss is 1.645022988319397 epoch: 5 step: 294, loss is 1.4979685544967651 epoch: 5 step: 295, loss is 1.5598974227905273 epoch: 5 step: 296, loss is 1.533524990081787 epoch: 5 step: 297, loss is 1.52768874168396 epoch: 5 step: 298, loss is 1.680696725845337 epoch: 5 step: 299, loss is 1.4535813331604004 epoch: 5 step: 300, loss is 1.6198949813842773 epoch: 5 step: 301, loss is 1.572227954864502 epoch: 5 step: 302, loss is 1.6345454454421997 epoch: 5 step: 303, loss is 1.641379475593567 epoch: 5 step: 304, loss is 1.6259126663208008 epoch: 5 step: 305, loss is 1.5314037799835205 epoch: 5 step: 306, loss is 1.491726279258728 epoch: 5 step: 307, loss is 1.655726432800293 epoch: 5 step: 308, loss is 1.5316689014434814 epoch: 5 step: 309, loss is 1.511293888092041 epoch: 5 step: 310, loss is 1.5328251123428345 epoch: 5 step: 311, loss is 1.5820999145507812 epoch: 5 step: 312, loss is 1.432289958000183 epoch: 5 step: 313, loss is 1.550790786743164 epoch: 5 step: 314, loss is 1.4269719123840332 epoch: 5 step: 315, loss is 1.5760592222213745 epoch: 5 step: 316, loss is 1.6118370294570923 epoch: 5 step: 317, loss is 1.501477599143982 epoch: 5 step: 318, loss is 1.4097479581832886 epoch: 5 step: 319, loss is 1.541562557220459 epoch: 5 step: 320, loss is 1.593937635421753 epoch: 5 step: 321, loss is 1.6400054693222046 epoch: 5 step: 322, loss is 1.5357779264450073 epoch: 5 step: 323, loss is 1.5754220485687256 epoch: 5 step: 324, loss is 1.475951075553894 epoch: 5 step: 325, loss is 1.485856533050537 epoch: 5 step: 326, loss is 1.636470913887024 epoch: 5 step: 327, loss is 1.4998888969421387 epoch: 5 step: 328, loss is 1.6415295600891113 epoch: 5 step: 329, loss is 1.5465683937072754 epoch: 5 step: 330, loss is 1.583432912826538 epoch: 5 step: 331, loss is 1.4831609725952148 epoch: 5 step: 332, loss is 1.4656082391738892 epoch: 5 step: 333, loss is 1.8529162406921387 epoch: 5 step: 334, loss is 1.5814213752746582 epoch: 5 step: 335, loss is 1.475433111190796 epoch: 5 step: 336, loss is 1.584326148033142 epoch: 5 step: 337, loss is 1.735493779182434 epoch: 5 step: 338, loss is 1.6232712268829346 epoch: 5 step: 339, loss is 1.6351675987243652 epoch: 5 step: 340, loss is 1.6148631572723389 epoch: 5 step: 341, loss is 1.5339014530181885 epoch: 5 step: 342, loss is 1.5618462562561035 epoch: 5 step: 343, loss is 1.4798628091812134 epoch: 5 step: 344, loss is 1.593613624572754 epoch: 5 step: 345, loss is 1.5689895153045654 epoch: 5 step: 346, loss is 1.5093587636947632 epoch: 5 step: 347, loss is 1.4439517259597778 epoch: 5 step: 348, loss is 1.554625153541565 epoch: 5 step: 349, loss is 1.555283546447754 epoch: 5 step: 350, loss is 1.6005525588989258 epoch: 5 step: 351, loss is 1.6001478433609009 epoch: 5 step: 352, loss is 1.5207288265228271 epoch: 5 step: 353, loss is 1.5999352931976318 epoch: 5 step: 354, loss is 1.640831708908081 epoch: 5 step: 355, loss is 1.590141773223877 epoch: 5 step: 356, loss is 1.5481913089752197 epoch: 5 step: 357, loss is 1.4639291763305664 epoch: 5 step: 358, loss is 1.5692020654678345 epoch: 5 step: 359, loss is 1.4926801919937134 epoch: 5 step: 360, loss is 1.5099605321884155 epoch: 5 step: 361, loss is 1.5415282249450684 epoch: 5 step: 362, loss is 1.558133840560913 epoch: 5 step: 363, loss is 1.5935063362121582 epoch: 5 step: 364, loss is 1.4944791793823242 epoch: 5 step: 365, loss is 1.5923937559127808 epoch: 5 step: 366, loss is 1.577401041984558 epoch: 5 step: 367, loss is 1.5803579092025757 epoch: 5 step: 368, loss is 1.5800776481628418 epoch: 5 step: 369, loss is 1.6132313013076782 epoch: 5 step: 370, loss is 1.5150730609893799 epoch: 5 step: 371, loss is 1.6051198244094849 epoch: 5 step: 372, loss is 1.5535749197006226 epoch: 5 step: 373, loss is 1.783123254776001 epoch: 5 step: 374, loss is 1.6735365390777588 epoch: 5 step: 375, loss is 1.8395044803619385 epoch: 5 step: 376, loss is 1.6252373456954956 epoch: 5 step: 377, loss is 1.6070736646652222 epoch: 5 step: 378, loss is 1.4488918781280518 epoch: 5 step: 379, loss is 1.6128714084625244 epoch: 5 step: 380, loss is 1.5303810834884644 epoch: 5 step: 381, loss is 1.5577932596206665 epoch: 5 step: 382, loss is 1.6230170726776123 epoch: 5 step: 383, loss is 1.6536394357681274 epoch: 5 step: 384, loss is 1.4754302501678467 epoch: 5 step: 385, loss is 1.6879909038543701 epoch: 5 step: 386, loss is 1.516668677330017 epoch: 5 step: 387, loss is 1.5837337970733643 epoch: 5 step: 388, loss is 1.720490574836731 epoch: 5 step: 389, loss is 1.5138230323791504 epoch: 5 step: 390, loss is 1.676946997642517 Train epoch time: 140180.187 ms, per step time: 359.436 ms total time:0h 17m 58s ============== Train Success ==============
训练好的模型保存在当前目录的shufflenetv1-5_390.ckpt
中,用作评估。
train
函数定义了模型的训练过程。设置MindSpore的运行模式为Pynative模式,目标设备为Ascend。损失函数为交叉熵损失,学习率采用余弦退火策略,优化器用Momentum。
使用Model类封装网络、损失函数和优化器,启用混合精度训练(AMP)。回调函数CheckpointConfig配置ModelCheckpoint的相关参数,ModelCheckpoint自动保存模型参数;TimeMonitor监测时间;LossMonitor监测损失值。
模型评估
在CIFAR-10的测试集上对模型进行评估。
设置好评估模型的路径后加载数据集,并设置Top 1, Top 5的评估标准,最后用model.eval()
接口对模型进行评估。
from mindspore import load_checkpoint, load_param_into_net
def test():
mindspore.set_context(mode=mindspore.GRAPH_MODE, device_target="Ascend")
dataset = get_dataset("./dataset/cifar-10-batches-bin", 128, "test")
net = ShuffleNetV1(model_size="2.0x", n_class=10)
param_dict = load_checkpoint("shufflenetv1-5_390.ckpt")
load_param_into_net(net, param_dict)
net.set_train(False)
loss = nn.CrossEntropyLoss(weight=None, reduction='mean', label_smoothing=0.1)
eval_metrics = {'Loss': nn.Loss(), 'Top_1_Acc': Top1CategoricalAccuracy(),
'Top_5_Acc': Top5CategoricalAccuracy()}
model = Model(net, loss_fn=loss, metrics=eval_metrics)
start_time = time.time()
res = model.eval(dataset, dataset_sink_mode=False)
use_time = time.time() - start_time
hour = str(int(use_time // 60 // 60))
minute = str(int(use_time // 60 % 60))
second = str(int(use_time % 60))
log = "result:" + str(res) + ", ckpt:'" + "./shufflenetv1-5_390.ckpt" \
+ "', time: " + hour + "h " + minute + "m " + second + "s"
print(log)
filename = './eval_log.txt'
with open(filename, 'a') as file_object:
file_object.write(log + '\n')
if __name__ == '__main__':
test()
model size is 2.0xresult:{'Loss': 1.5920430620511372, 'Top_1_Acc': 0.5093149038461539, 'Top_5_Acc': 0.9325921474358975}, ckpt:'./shufflenetv1-5_390.ckpt', time: 0h 1m 37s
设置MindSpore的运行模式为图模式(GRAPH_MODE),并指定设备为Ascend。使用之前定义的get_dataset
函数加载CIFAR-10测试集。加载预训练的ShuffleNetV1模型,并将其设置为评估模式(非训练模式)。定义损失函数和评估指标(包括损失、Top-1准确率和Top-5准确率)。使用model.eval
接口对模型进行评估,并记录评估时间。将评估结果和时间打印并记录到日志文件中。
模型预测
在CIFAR-10的测试集上对模型进行预测,并将预测结果可视化。
import mindspore
import matplotlib.pyplot as plt
import mindspore.dataset as ds
net = ShuffleNetV1(model_size="2.0x", n_class=10)
show_lst = []
param_dict = load_checkpoint("shufflenetv1-5_390.ckpt")
load_param_into_net(net, param_dict)
model = Model(net)
dataset_predict = ds.Cifar10Dataset(dataset_dir="./dataset/cifar-10-batches-bin", shuffle=False, usage="train")
dataset_show = ds.Cifar10Dataset(dataset_dir="./dataset/cifar-10-batches-bin", shuffle=False, usage="train")
dataset_show = dataset_show.batch(16)
show_images_lst = next(dataset_show.create_dict_iterator())["image"].asnumpy()
image_trans = [
vision.RandomCrop((32, 32), (4, 4, 4, 4)),
vision.RandomHorizontalFlip(prob=0.5),
vision.Resize((224, 224)),
vision.Rescale(1.0 / 255.0, 0.0),
vision.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]),
vision.HWC2CHW()
]
dataset_predict = dataset_predict.map(image_trans, 'image')
dataset_predict = dataset_predict.batch(16)
class_dict = {0:"airplane", 1:"automobile", 2:"bird", 3:"cat", 4:"deer", 5:"dog", 6:"frog", 7:"horse", 8:"ship", 9:"truck"}
# 推理效果展示(上方为预测的结果,下方为推理效果图片)
plt.figure(figsize=(16, 5))
predict_data = next(dataset_predict.create_dict_iterator())
output = model.predict(ms.Tensor(predict_data['image']))
pred = np.argmax(output.asnumpy(), axis=1)
index = 0
for image in show_images_lst:
plt.subplot(2, 8, index+1)
plt.title('{}'.format(class_dict[pred[index]]))
index += 1
plt.imshow(image)
plt.axis("off")
plt.show()
model size is 2.0x
加载预训练模型和参数。准备了用于预测的数据集,对图像进行了预处理操作(随机裁剪、水平翻转、调整大小、归一化等)。对预处理后的数据集进行批处理。使用model.predict()
方法对数据进行预测,将预测结果与实际图像一起显示出来。
核心思想:ShuffleNet引入Pointwise Group Convolution和Channel Shuffle,显著降低模型的计算量,同时保持了较高的模型精度。在有限的计算资源下达到最佳的模型精度,非常适合移动端和嵌入式设备。
关键技术点:Pointwise Group Convolution通过分组卷积,减少了参数量和计算量。Channel Shuffle通过通道重排操作,解决了不同组别通道之间的信息隔离问题。
模型架构:ShuffleNet模块对ResNet的Bottleneck单元进行了改进。ShuffleNet网络结构包括多个ShuffleNet模块的堆叠,通过不同阶段的重复和下采样操作,逐步增加通道数和减小特征图尺寸。
训练和评估:使用CIFAR-10数据集进行训练和评估。采用随机初始化的参数进行预训练,使用交叉熵损失函数和Momentum优化器,结合余弦退火学习率调整。在CIFAR-10测试集上进行评估,使用Top-1和Top-5准确率作为评估指标,验证模型的性能。
可视化分析:通过可视化预测结果,直观地了解了模型的分类效果。