1.SER训练报错: SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception
1.1.问题描述
在执行训练任务的时候报错
单卡训练
python3 tools/train.py -c train_data/my_data/ser_vi_layoutxlm_xfund_zh.yml
错误信息如下:
Traceback (most recent call last):
File "/root/anaconda3/envs/paddle38/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/paddle38/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 536, in _thread_loop
batch = self._get_data()
File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 638, in _get_data
raise RuntimeError("DataLoader {} workers exit unexpectedly, " \
RuntimeError: DataLoader 1 workers exit unexpectedly, pids: 1129
Traceback (most recent call last):
File "tools/train.py", line 208, in <module>
main(config, device, logger, vdl_writer)
File "tools/train.py", line 180, in main
program.train(config, train_dataloader, valid_dataloader, device, model,
File "/paddle/PaddleOCR/tools/program.py", line 258, in train
for idx, batch in enumerate(train_dataloader):
File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 745, in __next__
self._reader.read_next_list()[0])
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175)
1.2.问题分析
网上查了很多资料,结果都无法解决,然后认真分析错误日志以及官网提供的测试数据XFUND/zh_train/train.json文件和我们使用PPOCRLabel打标后生成的Label.txt文件,发现我们的Label.txt文件中每一个标注都缺少一个label属性。
1.3.解决方案
编辑我们的Label.txt文件,在"points"字段前添加一个"label"字段。
我直接做了一个字符串替换
替换的字符串: , “points”
替换后的字符串: ,“label”: “other”, “points”
最终的效果是
每一个识别的box都需要有一个label字段。
添加完成以后,再次执行,成功训练完成。
2. RE训练报错: KeyError: ‘linking’
2.1.问题分析
同上的问题一样,我们的Label.txt文件中每一个标注都缺少一个linking属性。
2.2.解决方案
编辑我们的Label.txt文件,在"points"字段前添加一个"linking"字段。
我直接做了一个字符串替换
替换的字符串: , “points”
替换后的字符串: ,“linking”: [], “points”
3.RE训练报错: KeyError: 'NONE’或KeyError: ‘B-NONE’
3.1.问题描述
在执行re训练的时候报错:
执行脚本:
vim train_data/myimgs/re_vi_layoutxlm_xfund_zh.yml
错误信息:
, error happened with msg: Traceback (most recent call last):
File "/paddle/PaddleOCR/ppocr/data/simple_dataset.py", line 137, in __getitem__
outs = transform(data, self.ops)
File "/paddle/PaddleOCR/ppocr/data/imaug/__init__.py", line 56, in transform
data = op(data)
File "/paddle/PaddleOCR/ppocr/data/imaug/label_ops.py", line 1093, in __call__
gt_label = self._parse_label(label, encode_res)
File "/paddle/PaddleOCR/ppocr/data/imaug/label_ops.py", line 1177, in _parse_label
gt_label.append(self.label2id_map[("b-" + label).upper()])
KeyError: 'B-NONE'
3.2.问题分析
查看了一下是/paddle/PaddleOCR/ppocr/data/imaug/label_ops.py文件报错,打开看了一下,发现里面的label写法是固定的,只有[“other”, “others”, “ignore”]这几个,其他的标签都会报错。
3.3.解决方案
修改label_ops.py文件
vim /paddle/PaddleOCR/ppocr/data/imaug/label_ops.py
根据错误提示,大概在1177行
修改前的内容:
def _parse_label(self, label, encode_res):
gt_label = []
if label.lower() in ["other", "others", "ignore"]:
gt_label.extend([0] * len(encode_res["input_ids"]))
else:
gt_label.append(self.label2id_map[("b-" + label).upper()])
gt_label.extend([self.label2id_map[("i-" + label).upper()]] *
(len(encode_res["input_ids"]) - 1))
return gt_label
修改后的内容:
def _parse_label(self, label, encode_res):
gt_label = []
if label.lower() in ["other", "others", "ignore","header","question","answer","none","key","value"]:
gt_label.extend([0] * len(encode_res["input_ids"]))
else:
gt_label.append(self.label2id_map[("b-" + label).upper()])
gt_label.extend([self.label2id_map[("i-" + label).upper()]] *
(len(encode_res["input_ids"]) - 1))
return gt_label