GitHub - 649453932/Bert-Chinese-Text-Classification-Pytorch: 使用Bert,ERNIE,进行中文文本分类使用Bert,ERNIE,进行中文文本分类. Contribute to 649453932/Bert-Chinese-Text-Classification-Pytorch development by creating an account on GitHub.https://github.com/649453932/Bert-Chinese-Text-Classification-Pytorch
gayhub上有一个项目,用Bert和ERNIE进行中文文本分类的,基于pytorch运行的挺好,但是在使用过程中有几个修改的地方。
1. 运行时报错没有THUCNews/saved_dict这个位置,新建个文件夹就行了。
# 中文模型
# https://github.com/649453932/Bert-Chinese-Text-Classification-Pytorch/tree/master
预训练模型下载地址:
bert_Chinese: 模型 https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz
词表 https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt
备用:模型的网盘地址:https://pan.baidu.com/s/1qSAD5gwClq7xlgzl_4W3Pw
ERNIE_Chinese: http://image.nghuyong.top/ERNIE.zip
备用:网盘地址:https://pan.baidu.com/s/1lEPdDN1-YQJmKEd_g9rLgw
解压后,按照上面说的放在对应目录下,文件名称确认无误即可。
# 缺文件夹
mkdir -p THUCNews/saved_dict/
2.项目有几个依赖库需要安装一下:
pip install torch
pip install tqdm scikit-learn tensorboardX -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install boto3 requests regex
python3 run.py --model bert
3.代码在运行时会报几个Warning,大概是pytorch升级了,旧的函数被弃用,不影响运行。
但可以如此修改以消除警告。
pytorch_pretrained\optimization.py:275: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at ..\torch\csrc\utils\python_arg_parser.cpp:1025.)
改为:
next_m.mul_(beta1).add_(1 - beta1, grad)
改为add_(grad, alpha=1 - beta1)即可
.addcmul(grad, grad, value = 1-beta2)