一、项目说明
项目使用paddleNLP提供的大模型套件对Baichuan2-7b/13b进行微调,使用《中医治疗新冠流感支原体感染等有效病历集》进行Lora训练,使大模型具备使用中医方案诊断和治疗新冠、流感等上呼吸道感染的能力。
二、PaddleNLP
PaddleNLP提供的飞桨大模型套件秉承了一站式体验、性能极致、生态兼容的设计理念,旨在提供业界主流大模型预训练、精调(含SFT、PEFT)、量化、推理等统一流程, 帮助开发者低成本、低门槛、快速实现大语言模型定制化。PaddleNLP支持多个主流大模型的SFT、LoRA、Prefix Tuning等精调策略,提供统一、高效精调方案:
-
1. 统一训练入口。飞桨大模型套件精调方案可适配业界主流大模型,用户只需修改配置文件,即能在单卡或多卡(支持4D并行分布式策略)进行多种大模型精调。
-
1. 高效数据和分布式策略。Zero Padding零填充优化策略有效减少了pad token的占比,提高模型训练效率高达100%。独创PEFT结合低比特和分布式并行策略,大幅降低大模型精调硬件门槛,支持单卡 (A100 80G)百亿模型微调、单机(A100 80G * 8)千亿模型微调。
-
1. 支持多轮对话。支持统一对话模板,支持多轮对话高效训练,详参多轮对话文档。
三、Baichuan2-7b/13b-chat
Baichuan2系列产品是百川智能在深度学习领域的最新成果,经过微调后的模型在多个任务上取得了优异的性能。开源这些模型将为开发者提供一个强大的工具,帮助他们在各种应用场景中实现更高效、更准确的人工智能应用.
Baichuan 2系列产品完全开源,并且在在「免费商用」这条路上,Baichuan 2 践行得非常彻底,极大弥补了中国开源生态的短板,让中国开发者用上了对中文场景更友好的开源大模型。
Baichuan2系列模型效率也很高,130亿参数的Baichuan2-13b量化版,在消费级显卡的笔记本电脑上也可以实现快速推理。因此,我们选用Baichuan2系统模型做为本项目的基座
四、训练数据说明
《中医治疗新冠流感支原体感染等有效病历集》是云中医整理的近期高发上呼吸道感染中医诊断治疗的有效病历,包含新冠,甲流,支原体,腺病毒,合胞病毒等各种病毒引发的感冒、咳嗽等病历。经处理弱化了原病历的处方及处方药,增加了OTC中成药及家庭食疗的治疗方案,避免医疗的资质问题及可能的纠纷,更适合于一般轻症的自我诊所治疗。 数据分两部分:case为病历记录,diagnosis为从病历提取的诊断结果及处方。数据示例如下:
{"case":"患者,男性,45岁,因新冠感染前来就诊。患者近日出现恶寒、无汗、后背痛的症状,并有发热、身痛、头痛。
背部疼痛严重,影响日常生活。患者还表现出清涕、鼻塞、神疲乏力、声哑、无食欲等症状。舌淡苔白,脉紧。根据患者的主症
和症状关联,考虑为葛根汤证。葛根汤为中医经典方剂,主要用于治疗风寒感冒,尤其对于恶寒、无汗、后背痛等症状有显著疗
效。综上所述,患者新冠感染后出现恶寒、无汗、后背痛、发热、身痛、头痛等症状,考虑为葛根汤证。建议采用葛根汤进行治
疗。",
"diagnosis":"诊断:太阳阳明伤寒 。建议处方:葛根汤。建议中成药:葛根汤颗粒或风寒感冒颗粒或感冒软胶
囊 建议食疗:葱白姜汤"}
PaddleNLP训练数据支持的数据格式是每行包含一个字典,每个字典包含以下字段:
src : str, List(str), 模型的输入指令(instruction)、提示(prompt),模型应该执行的任务。
tgt : str, List(str), 模型的输出。
因此,在训练前,需要将训练数据转换为要求的格式数据。
五、环境准备
1. 获取并安装最新版PaddleNLP
In [1]
#直接克隆github上的最新版本,考虑网络问题,也可以从gitee上克隆(gitee可能版本不是最新,最好是从github上取)
#!git clone https://gitee.com/PaddlePaddle/PaddleNLP
!git clone https://github.com/PaddlePaddle/PaddleNLP.git
Cloning into 'PaddleNLP'... remote: Enumerating objects: 60471, done. remote: Counting objects: 100% (578/578), done. remote: Compressing objects: 100% (423/423), done. remote: Total 60471 (delta 271), reused 382 (delta 144), pack-reused 59893 Receiving objects: 100% (60471/60471), 97.72 MiB | 15.36 MiB/s, done. Resolving deltas: 100% (41419/41419), done.
In [2]
# 安装本地下载的版本.
!pip install -r PaddleNLP/requirements.txt
!pip install -e ./PaddleNLP
Looking in indexes: https://mirror.baidu.com/pypi/simple/, https://mirrors.aliyun.com/pypi/simple/, https://pypi.tuna.tsinghua.edu.cn/simple/ Ignoring protobuf: markers 'platform_system == "Windows"' don't match your environment Requirement already satisfied: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 1)) (0.42.1) Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 2)) (6.8.0) Requirement already satisfied: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 3)) (0.4.6) Requirement already satisfied: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 4)) (1.2.2) Requirement already satisfied: dill<0.3.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 5)) (0.3.4) Requirement already satisfied: multiprocess<=0.70.12.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 6)) (0.70.12.2) Requirement already satisfied: datasets>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 7)) (2.16.0) Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 8)) (4.66.1) Requirement already satisfied: paddlefsl in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 9)) (1.1.0) Requirement already satisfied: sentencepiece in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 10)) (0.1.99) Requirement already satisfied: huggingface_hub>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 11)) (0.20.1) Requirement already satisfied: onnx>=1.10.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 12)) (1.15.0) Requirement already satisfied: protobuf>=3.20.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 13)) (3.20.3) Requirement already satisfied: paddle2onnx in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 15)) (1.1.0) Requirement already satisfied: Flask-Babel in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 16)) (4.0.0) Requirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 17)) (2.5.3) Requirement already satisfied: fastapi in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 18)) (0.105.0) Requirement already satisfied: uvicorn in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 19)) (0.25.0) Requirement already satisfied: typer in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 20)) (0.9.0) Requirement already satisfied: rich in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 21)) (13.7.0) Requirement already satisfied: safetensors in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 22)) (0.4.1) Requirement already satisfied: tool_helpers in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 23)) (0.1.1) Requirement already satisfied: aistudio-sdk>=0.1.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 24)) (0.1.5) Requirement already satisfied: jinja2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r PaddleNLP/requirements.txt (line 25)) (3.1.2) Requirement already satisfied: numpy>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from seqeval->-r PaddleNLP/requirements.txt (line 4)) (1.26.2) Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from seqeval->-r PaddleNLP/requirements.txt (line 4)) (1.3.2) Requirement already satisfied: filelock in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (3.13.1) Requirement already satisfied: pyarrow>=8.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (14.0.2) Requirement already satisfied: pyarrow-hotfix in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (0.6) Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2.1.4) Requirement already satisfied: requests>=2.19.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2.31.0) Requirement already satisfied: xxhash in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (3.4.1) Requirement already satisfied: fsspec<=2023.10.0,>=2023.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fsspec[http]<=2023.10.0,>=2023.1.0->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2023.10.0) Requirement already satisfied: aiohttp in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (3.9.1) Requirement already satisfied: packaging in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (23.2) Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (6.0.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from huggingface_hub>=0.11.1->-r PaddleNLP/requirements.txt (line 11)) (4.9.0) Requirement already satisfied: Babel>=2.12 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (2.14.0) Requirement already satisfied: Flask>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (3.0.0) Requirement already satisfied: pytz>=2022.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (2023.3.post1) Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (0.8.98) Requirement already satisfied: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (10.1.0) Requirement already satisfied: six>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (1.16.0) Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (3.8.2) Requirement already satisfied: rarfile in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (4.1) Requirement already satisfied: psutil in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->-r PaddleNLP/requirements.txt (line 17)) (5.9.7) Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->-r PaddleNLP/requirements.txt (line 18)) (3.7.1) Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->-r PaddleNLP/requirements.txt (line 18)) (2.5.3) Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->-r PaddleNLP/requirements.txt (line 18)) (0.27.0) Requirement already satisfied: click>=7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from uvicorn->-r PaddleNLP/requirements.txt (line 19)) (8.1.7) Requirement already satisfied: h11>=0.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from uvicorn->-r PaddleNLP/requirements.txt (line 19)) (0.14.0) Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from rich->-r PaddleNLP/requirements.txt (line 21)) (2.2.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from rich->-r PaddleNLP/requirements.txt (line 21)) (2.17.2) Requirement already satisfied: pybind11 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from tool_helpers->-r PaddleNLP/requirements.txt (line 23)) (2.11.1) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from jinja2->-r PaddleNLP/requirements.txt (line 25)) (2.1.3) Requirement already satisfied: idna>=2.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->-r PaddleNLP/requirements.txt (line 18)) (3.6) Requirement already satisfied: sniffio>=1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->-r PaddleNLP/requirements.txt (line 18)) (1.3.0) Requirement already satisfied: exceptiongroup in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->-r PaddleNLP/requirements.txt (line 18)) (1.2.0) Requirement already satisfied: Werkzeug>=3.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (3.0.1) Requirement already satisfied: itsdangerous>=2.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (2.1.2) Requirement already satisfied: blinker>=1.6.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->-r PaddleNLP/requirements.txt (line 16)) (1.7.0) Requirement already satisfied: attrs>=17.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (23.1.0) Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (6.0.4) Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (1.9.4) Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (1.4.1) Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (1.3.1) Requirement already satisfied: async-timeout<5.0,>=4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (4.0.3) Requirement already satisfied: mdurl~=0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->-r PaddleNLP/requirements.txt (line 21)) (0.1.1) Requirement already satisfied: annotated-types>=0.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->-r PaddleNLP/requirements.txt (line 18)) (0.6.0) Requirement already satisfied: pydantic-core==2.14.6 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->-r PaddleNLP/requirements.txt (line 18)) (2.14.6) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.19.0->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (3.3.2) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.19.0->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2.1.0) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.19.0->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2023.11.17) Requirement already satisfied: scipy>=1.5.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->-r PaddleNLP/requirements.txt (line 4)) (1.11.4) Requirement already satisfied: joblib>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->-r PaddleNLP/requirements.txt (line 4)) (1.3.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->-r PaddleNLP/requirements.txt (line 4)) (3.2.0) Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->visualdl->-r PaddleNLP/requirements.txt (line 17)) (3.19.0) Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->visualdl->-r PaddleNLP/requirements.txt (line 17)) (0.18.3) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (1.2.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (4.47.0) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (1.4.5) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (3.1.1) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->-r PaddleNLP/requirements.txt (line 17)) (2.8.2) Requirement already satisfied: tzdata>=2022.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pandas->datasets>=2.0.0->-r PaddleNLP/requirements.txt (line 7)) (2023.3) Looking in indexes: https://mirror.baidu.com/pypi/simple/, https://mirrors.aliyun.com/pypi/simple/, https://pypi.tuna.tsinghua.edu.cn/simple/ Obtaining file:///home/aistudio/PaddleNLP Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Installing backend dependencies ... done Preparing editable metadata (pyproject.toml) ... done Requirement already satisfied: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.42.1) Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (6.8.0) Requirement already satisfied: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.4.6) Requirement already satisfied: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (1.2.2) Requirement already satisfied: dill<0.3.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.3.4) Requirement already satisfied: multiprocess<=0.70.12.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.70.12.2) Requirement already satisfied: datasets>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (2.16.0) Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (4.66.1) Requirement already satisfied: paddlefsl in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (1.1.0) Requirement already satisfied: sentencepiece in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.1.99) Requirement already satisfied: huggingface-hub>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.20.1) Requirement already satisfied: onnx>=1.10.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (1.15.0) Requirement already satisfied: paddle2onnx in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (1.1.0) Requirement already satisfied: Flask-Babel in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (4.0.0) Requirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (2.5.3) Requirement already satisfied: fastapi in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.105.0) Requirement already satisfied: uvicorn in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.25.0) Requirement already satisfied: typer in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.9.0) Requirement already satisfied: rich in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (13.7.0) Requirement already satisfied: safetensors in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.4.1) Requirement already satisfied: tool-helpers in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.1.1) Requirement already satisfied: aistudio-sdk>=0.1.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (0.1.5) Requirement already satisfied: jinja2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (3.1.2) Requirement already satisfied: protobuf>=3.20.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from paddlenlp==2.6.1.post0) (3.20.3) Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aistudio-sdk>=0.1.3->paddlenlp==2.6.1.post0) (2.31.0) Requirement already satisfied: filelock in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (3.13.1) Requirement already satisfied: numpy>=1.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (1.26.2) Requirement already satisfied: pyarrow>=8.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (14.0.2) Requirement already satisfied: pyarrow-hotfix in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (0.6) Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (2.1.4) Requirement already satisfied: xxhash in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (3.4.1) Requirement already satisfied: fsspec<=2023.10.0,>=2023.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fsspec[http]<=2023.10.0,>=2023.1.0->datasets>=2.0.0->paddlenlp==2.6.1.post0) (2023.10.0) Requirement already satisfied: aiohttp in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (3.9.1) Requirement already satisfied: packaging in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (23.2) Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from datasets>=2.0.0->paddlenlp==2.6.1.post0) (6.0.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from huggingface-hub>=0.11.1->paddlenlp==2.6.1.post0) (4.9.0) Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->paddlenlp==2.6.1.post0) (3.7.1) Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->paddlenlp==2.6.1.post0) (2.5.3) Requirement already satisfied: starlette<0.28.0,>=0.27.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from fastapi->paddlenlp==2.6.1.post0) (0.27.0) Requirement already satisfied: Babel>=2.12 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->paddlenlp==2.6.1.post0) (2.14.0) Requirement already satisfied: Flask>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->paddlenlp==2.6.1.post0) (3.0.0) Requirement already satisfied: pytz>=2022.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask-Babel->paddlenlp==2.6.1.post0) (2023.3.post1) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from jinja2->paddlenlp==2.6.1.post0) (2.1.3) Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from rich->paddlenlp==2.6.1.post0) (2.2.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from rich->paddlenlp==2.6.1.post0) (2.17.2) Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from seqeval->paddlenlp==2.6.1.post0) (1.3.2) Requirement already satisfied: pybind11 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from tool-helpers->paddlenlp==2.6.1.post0) (2.11.1) Requirement already satisfied: click<9.0.0,>=7.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from typer->paddlenlp==2.6.1.post0) (8.1.7) Requirement already satisfied: h11>=0.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from uvicorn->paddlenlp==2.6.1.post0) (0.14.0) Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (0.8.98) Requirement already satisfied: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (10.1.0) Requirement already satisfied: six>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (1.16.0) Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (3.8.2) Requirement already satisfied: rarfile in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (4.1) Requirement already satisfied: psutil in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from visualdl->paddlenlp==2.6.1.post0) (5.9.7) Requirement already satisfied: idna>=2.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->paddlenlp==2.6.1.post0) (3.6) Requirement already satisfied: sniffio>=1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->paddlenlp==2.6.1.post0) (1.3.0) Requirement already satisfied: exceptiongroup in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi->paddlenlp==2.6.1.post0) (1.2.0) Requirement already satisfied: Werkzeug>=3.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->paddlenlp==2.6.1.post0) (3.0.1) Requirement already satisfied: itsdangerous>=2.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->paddlenlp==2.6.1.post0) (2.1.2) Requirement already satisfied: blinker>=1.6.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from Flask>=2.0->Flask-Babel->paddlenlp==2.6.1.post0) (1.7.0) Requirement already satisfied: attrs>=17.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (23.1.0) Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (6.0.4) Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (1.9.4) Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (1.4.1) Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (1.3.1) Requirement already satisfied: async-timeout<5.0,>=4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from aiohttp->datasets>=2.0.0->paddlenlp==2.6.1.post0) (4.0.3) Requirement already satisfied: mdurl~=0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->paddlenlp==2.6.1.post0) (0.1.1) Requirement already satisfied: annotated-types>=0.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->paddlenlp==2.6.1.post0) (0.6.0) Requirement already satisfied: pydantic-core==2.14.6 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->paddlenlp==2.6.1.post0) (2.14.6) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk>=0.1.3->paddlenlp==2.6.1.post0) (3.3.2) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk>=0.1.3->paddlenlp==2.6.1.post0) (2.1.0) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests->aistudio-sdk>=0.1.3->paddlenlp==2.6.1.post0) (2023.11.17) Requirement already satisfied: scipy>=1.5.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.6.1.post0) (1.11.4) Requirement already satisfied: joblib>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.6.1.post0) (1.3.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.6.1.post0) (3.2.0) Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->visualdl->paddlenlp==2.6.1.post0) (3.19.0) Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from bce-python-sdk->visualdl->paddlenlp==2.6.1.post0) (0.18.3) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (1.2.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (4.47.0) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (1.4.5) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (3.1.1) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from matplotlib->visualdl->paddlenlp==2.6.1.post0) (2.8.2) Requirement already satisfied: tzdata>=2022.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from pandas->datasets>=2.0.0->paddlenlp==2.6.1.post0) (2023.3) Building wheels for collected packages: paddlenlp Building editable for paddlenlp (pyproject.toml) ... done Created wheel for paddlenlp: filename=paddlenlp-2.6.1.post0-0.editable-py3-none-any.whl size=15186 sha256=d63900491865a4c53fb8126468b30096bf5f9f684b281a3a5413724d608a6f40 Stored in directory: /tmp/pip-ephem-wheel-cache-dxh_79d_/wheels/ef/67/51/d39210219524142315c8b4babdd3bb2610f53d4d50639f381e Successfully built paddlenlp Installing collected packages: paddlenlp Attempting uninstall: paddlenlp Found existing installation: paddlenlp 2.6.1.post0 Uninstalling paddlenlp-2.6.1.post0: Successfully uninstalled paddlenlp-2.6.1.post0 Successfully installed paddlenlp-2.6.1.post0
In [3]
# 查看是否安装成功,为确保可用,此处应重启一下内核
!pip list|grep paddlenlp
paddlenlp 2.6.1.post0 /home/aistudio/PaddleNLP
2. 获取Baichuan2-7B/13B-chat模型 AIStudio以及集成了Baichuan2系列模型,模型可以使用from_aistudio=True参数直接加载,代码如下:
AutoModelForCausalLM.from_pretrained(
"aistudio/Baichuan2-7B-Chat", from_aistudio=True
)
不过考虑到本地化部署,我们还是先克隆下来,这里使用7B模型,大家可以根据自己的需要选择模型的版本
In [9]
# 可以从aistudio直接克隆,速度最快:
!git clone http://git.aistudio.baidu.com/aistudio/Baichuan2-7B-Chat.git
Cloning into 'Baichuan2-7B-Chat'... remote: Enumerating objects: 75, done. remote: Counting objects: 100% (75/75), done. remote: Compressing objects: 100% (74/74), done. remote: Total 75 (delta 30), reused 0 (delta 0), pack-reused 0 Unpacking objects: 100% (75/75), 13.65 KiB | 873.00 KiB/s, done. Filtering content: 100% (9/9), 3.96 GiB | 8.93 MiB/s, done. Encountered 6 files that may not have been copied correctly on Windows: model-00003-of-00004.safetensors model_state-00003-of-00004.pdparams model_state-00001-of-00004.pdparams model_state-00002-of-00004.pdparams model-00002-of-00004.safetensors model-00001-of-00004.safetensors See: `git lfs help smudge` for more details.
六、数据准备
1. 按训练格式要求转换训练数据
In [5]
import json
from sklearn.model_selection import train_test_split
# 读取 JSON 文件
with open('data/data254538/RecentColdMedicalCase.json', 'r', encoding='utf-8') as f:
data = json.load(f)
# 将数据集划分为训练集和测试集
train, dev = train_test_split(data, test_size=0.1, random_state=42)
#安装训练要求格式转换为src/tgt数据,每条数据一行
with open('TrainData/train.json', 'w', encoding="utf-8") as f:
for item in train:
temp = dict()
temp['src'] = item['case']
temp['tgt'] = item['diagnosis']
json.dump(temp, f, ensure_ascii=False)
f.write('\n')
with open('TrainData/dev.json', 'w', encoding="utf-8") as f:
for item in dev:
temp = dict()
temp['src'] = item['case']
temp['tgt'] = item['diagnosis']
json.dump(temp, f, ensure_ascii=False)
f.write('\n')
2. 编辑微调参数 /home/aistudio/PaddleNLP/llm/llama/lora_argument.json中预设了Lora微调的参数,不需要在命令行输入。直接编辑文档,主要修改前两行,模型路径和数据路径,其他参数可以自己根据注释内容自行调整
{
#预训练模型内置名称或者模型所在目录,默认为facebook/llama-7b
"model_name_or_path": "/home/aistudio/Baichuan2-7B-Chat",
#训练数据所在目录
"dataset_name_or_path": "/home/aistudio/TrainData",
#模型参数保存目录
"output_dir": "./checkpoints/llama_lora_ckpts",
#训练批次大小
"per_device_train_batch_size": 4,
#模型参数梯度累积的步数,可用于扩大 batch size。实际的 batch_size = per_device_train_batch_size * gradient_accumulation_steps。
"gradient_accumulation_steps": 4,
#评估批次大小
"per_device_eval_batch_size": 8,
#评估累积步数
"eval_accumulation_steps":16,
#要执行的训练 epoch 总数(如果不是整数,将在停止训练之前执行最后一个 epoch 的小数部分百分比)
"num_train_epochs": 3,
#参数更新的学习率。
"learning_rate": 3e-04,
#学习率热启的步数。
"warmup_steps": 30,
#训练日志打印的间隔步数。
"logging_steps": 1,
#模型评估的策略:每个epoch评估一次,每个batch评估一次或不定期
"evaluation_strategy": "epoch",
#模型保存的策略
"save_strategy": "epoch",
#上下文的最大输入长度,默认为128.
"src_length": 1024,
#
"max_length": 2048,
#使用 float16 精度进行模型训练和推理
"fp16": true,
# float16 精度训练模式,O2表示纯 float16 训练。
"fp16_opt_level": "O2",
#是否训练模型。
"do_train": true,
#是否评估模型。
"do_eval": true,
#是否禁用tqdm库的进度条。
"disable_tqdm": true,
#否在训练结束后加载最佳模型
"load_best_model_at_end": true,
#在评估的时候是否调用model.generate,默认为False。
"eval_with_do_generation": false,
#用于比较模型的评估指标,如loss,accuracy等
"metric_for_best_model": "accuracy",
#是否重新计算评估指标
"recompute": true,
#存储和管理的模型数量,是否保存多个副本
"save_total_limit": 1,
#模型并行数量。
"tensor_parallel_degree": 1,
#流水线中并行执行的任务数量
"pipeline_parallel_degree": 1,
#是否使用LoRA技术。
"lora": true,
#是否使用零填充
"zero_padding": false,
#是否使用Flash Attention(快速注意力)机制。
"use_flash_attention": false
}
七、进行训练
1. 训练前先测试下原始模型的能力
In [3]
import json
import paddle
import get_result
from paddlenlp.transformers import AutoModelForCausalLM,LlamaTokenizer
#载入模型及权重
model = AutoModelForCausalLM.from_pretrained(
'/home/aistudio/Baichuan2-7B-Chat',
dtype="float16",
tensor_parallel_degree=0,
tensor_parallel_rank=0,
)
model.eval()
tokenizer = LlamaTokenizer.from_pretrained('/home/aistudio/Baichuan2-7B-Chat')
result=get_result.generate(model,tokenizer,"我感冒了,有点咳嗽,发热,头疼,有口渴但是小便不利")
print(result)
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2023-12-27 12:58:44,513] [ INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load '/home/aistudio/Baichuan2-7B-Chat'. [2023-12-27 12:58:44,514] [ INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/config.json [2023-12-27 12:58:44,518] [ INFO] - Loading weights file /home/aistudio/Baichuan2-7B-Chat/model.safetensors.index.json W1227 12:58:44.522776 2705 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8 W1227 12:58:44.524257 2705 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9. Loading checkpoint shards: 100%|██████████| 4/4 [03:48<00:00, 57.18s/it] [2023-12-27 13:02:48,099] [ INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM. [2023-12-27 13:02:48,100] [ INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at /home/aistudio/Baichuan2-7B-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [2023-12-27 13:02:48,106] [ INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/generation_config.json
。” 根据您提供的症状, 可能是由于外感风寒引起的感冒现象. 这是一种常见的疾病,可以通过服用一些药物来缓解症状并促进康复。然而,在开始任何药物治疗之前,请务必咨询专业医生的意见和建议;因为每个人的病情和体质不同,可能需要不同的治疗方案或用药剂量。以下是一些建议供您参考: 1. 多休息、多饮水以帮助身体排毒;避免食用辛辣刺激性食物以及油腻食物以减少对呼吸道的刺激 ;保持室内空气流通,以免空气过于干燥引起咽喉不适等症状加重</s>
原始模型的回答比较泛,没有针对病情的精确诊断,也没有太有效的方案。接下来我们使用训练数据进行微调训练
2. 进行微调
执行下面训练前,先要重启一下内核,释放显存,否则会显存不够用
In [1]
%cd ~/PaddleNLP/llm/
# 单卡训练
!python finetune_generation.py ./llama/lora_argument.json
# 分布式训练
# 将lora_argument.json中tensor_parallel_degree修改为2
#python -u -m paddle.distributed.launch --gpus "0,1" finetune_generation.py ./llama/lora_argument.json
/home/aistudio/PaddleNLP/llm
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/IPython/core/magics/osm.py:393: UserWarning: using bookmarks requires you to install the `pickleshare` library. bkms = self.shell.db.get('bookmarks', {}) /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library. self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2023-12-26 17:37:10,698] [ INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-). [2023-12-26 17:37:10,699] [ INFO] - ============================================================ [2023-12-26 17:37:10,699] [ INFO] - Model Configuration Arguments [2023-12-26 17:37:10,699] [ INFO] - paddle commit id : 3a1b1659a405a044ce806fbe027cc146f1193e6d [2023-12-26 17:37:10,699] [ INFO] - paddlenlp commit id : 942865f52b42cd6e0666a19af316f32e151694eb.dirty [2023-12-26 17:37:10,699] [ INFO] - aistudio_repo_id : None [2023-12-26 17:37:10,699] [ INFO] - aistudio_repo_license : Apache License 2.0 [2023-12-26 17:37:10,699] [ INFO] - aistudio_repo_private : True [2023-12-26 17:37:10,699] [ INFO] - aistudio_token : None [2023-12-26 17:37:10,699] [ INFO] - from_aistudio : False [2023-12-26 17:37:10,699] [ INFO] - lora : True [2023-12-26 17:37:10,699] [ INFO] - lora_path : None [2023-12-26 17:37:10,699] [ INFO] - lora_rank : 8 [2023-12-26 17:37:10,700] [ INFO] - model_name_or_path : /home/aistudio/Baichuan2-7B-Chat [2023-12-26 17:37:10,700] [ INFO] - neftune : False [2023-12-26 17:37:10,700] [ INFO] - neftune_noise_alpha : 5.0 [2023-12-26 17:37:10,700] [ INFO] - num_prefix_tokens : 128 [2023-12-26 17:37:10,700] [ INFO] - prefix_tuning : False [2023-12-26 17:37:10,700] [ INFO] - save_to_aistudio : False [2023-12-26 17:37:10,700] [ INFO] - use_flash_attention : False [2023-12-26 17:37:10,700] [ INFO] - weight_blocksize : 64 [2023-12-26 17:37:10,700] [ INFO] - weight_double_quant : False [2023-12-26 17:37:10,700] [ INFO] - weight_double_quant_block_size: 256 [2023-12-26 17:37:10,700] [ INFO] - weight_quantize_algo : None [2023-12-26 17:37:10,700] [ INFO] - [2023-12-26 17:37:10,700] [ INFO] - ============================================================ [2023-12-26 17:37:10,700] [ INFO] - Data Configuration Arguments [2023-12-26 17:37:10,700] [ INFO] - paddle commit id : 3a1b1659a405a044ce806fbe027cc146f1193e6d [2023-12-26 17:37:10,700] [ INFO] - paddlenlp commit id : 942865f52b42cd6e0666a19af316f32e151694eb.dirty [2023-12-26 17:37:10,700] [ INFO] - chat_template : None [2023-12-26 17:37:10,700] [ INFO] - dataset_name_or_path : /home/aistudio/TrainData [2023-12-26 17:37:10,701] [ INFO] - eval_with_do_generation : False [2023-12-26 17:37:10,701] [ INFO] - intokens : None [2023-12-26 17:37:10,701] [ INFO] - lazy : False [2023-12-26 17:37:10,701] [ INFO] - max_length : 2048 [2023-12-26 17:37:10,701] [ INFO] - save_generation_output : False [2023-12-26 17:37:10,701] [ INFO] - src_length : 1024 [2023-12-26 17:37:10,701] [ INFO] - task_name : None [2023-12-26 17:37:10,701] [ INFO] - task_name_or_path : None [2023-12-26 17:37:10,701] [ INFO] - zero_padding : False [2023-12-26 17:37:10,701] [ INFO] - [2023-12-26 17:37:10,701] [ INFO] - ============================================================ [2023-12-26 17:37:10,701] [ INFO] - Quant Configuration Arguments [2023-12-26 17:37:10,701] [ INFO] - paddle commit id : 3a1b1659a405a044ce806fbe027cc146f1193e6d [2023-12-26 17:37:10,701] [ INFO] - paddlenlp commit id : 942865f52b42cd6e0666a19af316f32e151694eb.dirty [2023-12-26 17:37:10,701] [ INFO] - do_gptq : False [2023-12-26 17:37:10,701] [ INFO] - do_ptq : False [2023-12-26 17:37:10,701] [ INFO] - do_qat : False [2023-12-26 17:37:10,702] [ INFO] - gptq_step : 8 [2023-12-26 17:37:10,702] [ INFO] - ptq_step : 32 [2023-12-26 17:37:10,702] [ INFO] - quant_type : a8w8 [2023-12-26 17:37:10,702] [ INFO] - shift : False [2023-12-26 17:37:10,702] [ INFO] - shift_all_linears : False [2023-12-26 17:37:10,702] [ INFO] - shift_sampler : ema [2023-12-26 17:37:10,702] [ INFO] - shift_step : 32 [2023-12-26 17:37:10,702] [ INFO] - smooth : False [2023-12-26 17:37:10,702] [ INFO] - smooth_all_linears : False [2023-12-26 17:37:10,702] [ INFO] - smooth_k_piece : 3 [2023-12-26 17:37:10,702] [ INFO] - smooth_piecewise_search : False [2023-12-26 17:37:10,703] [ INFO] - smooth_sampler : none [2023-12-26 17:37:10,703] [ INFO] - smooth_search_piece : False [2023-12-26 17:37:10,703] [ INFO] - smooth_step : 32 [2023-12-26 17:37:10,703] [ INFO] - [2023-12-26 17:37:10,703] [ INFO] - ============================================================ [2023-12-26 17:37:10,703] [ INFO] - Generation Configuration Arguments [2023-12-26 17:37:10,703] [ INFO] - paddle commit id : 3a1b1659a405a044ce806fbe027cc146f1193e6d [2023-12-26 17:37:10,703] [ INFO] - paddlenlp commit id : 942865f52b42cd6e0666a19af316f32e151694eb.dirty [2023-12-26 17:37:10,703] [ INFO] - top_k : 1 [2023-12-26 17:37:10,703] [ INFO] - top_p : 1.0 [2023-12-26 17:37:10,703] [ INFO] - [2023-12-26 17:37:10,703] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True [2023-12-26 17:37:10,704] [ INFO] - We are using <class 'paddlenlp.transformers.llama.configuration.LlamaConfig'> to load '/home/aistudio/Baichuan2-7B-Chat'. [2023-12-26 17:37:10,704] [ INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/config.json [2023-12-26 17:37:10,705] [ INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load '/home/aistudio/Baichuan2-7B-Chat'. [2023-12-26 17:37:10,706] [ INFO] - Loading weights file /home/aistudio/Baichuan2-7B-Chat/model.safetensors.index.json W1226 17:37:10.709461 26242 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8 W1226 17:37:10.710600 26242 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9. Loading checkpoint shards: 100%|██████████████████| 4/4 [04:16<00:00, 64.11s/it] [2023-12-26 17:41:56,850] [ INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM. [2023-12-26 17:41:56,850] [ INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at /home/aistudio/Baichuan2-7B-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [2023-12-26 17:41:56,853] [ INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/generation_config.json [2023-12-26 17:41:56,853] [ INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load '/home/aistudio/Baichuan2-7B-Chat'. Downloading data files: 100%|███████████████████| 1/1 [00:00<00:00, 7436.71it/s] Extracting data files: 100%|████████████████████| 1/1 [00:00<00:00, 1189.87it/s] Generating train split: 1848 examples [00:00, 108999.65 examples/s] Downloading data files: 100%|██████████████████| 1/1 [00:00<00:00, 11214.72it/s] Extracting data files: 100%|████████████████████| 1/1 [00:00<00:00, 1536.38it/s] Generating train split: 206 examples [00:00, 77987.78 examples/s] [2023-12-26 17:42:21,202] [ INFO] - Frozen parameters: 7.51e+09 || Trainable parameters:2.00e+07 || Total parameters:7.53e+09|| Trainable:0.27% [2023-12-26 17:42:21,202] [ INFO] - The global seed is set to 42, local seed is set to 43 and random seed is set to 42. [2023-12-26 17:42:21,238] [ INFO] - Using half precision [2023-12-26 17:42:21,268] [ INFO] - ============================================================ [2023-12-26 17:42:21,268] [ INFO] - Training Configuration Arguments [2023-12-26 17:42:21,268] [ INFO] - paddle commit id : 3a1b1659a405a044ce806fbe027cc146f1193e6d [2023-12-26 17:42:21,268] [ INFO] - paddlenlp commit id : 942865f52b42cd6e0666a19af316f32e151694eb.dirty [2023-12-26 17:42:21,268] [ INFO] - _no_sync_in_gradient_accumulation: True [2023-12-26 17:42:21,268] [ INFO] - adam_beta1 : 0.9 [2023-12-26 17:42:21,268] [ INFO] - adam_beta2 : 0.999 [2023-12-26 17:42:21,268] [ INFO] - adam_epsilon : 1e-08 [2023-12-26 17:42:21,268] [ INFO] - amp_custom_black_list : None [2023-12-26 17:42:21,268] [ INFO] - amp_custom_white_list : None [2023-12-26 17:42:21,268] [ INFO] - amp_master_grad : False [2023-12-26 17:42:21,269] [ INFO] - autotuner_benchmark : False [2023-12-26 17:42:21,269] [ INFO] - benchmark : False [2023-12-26 17:42:21,269] [ INFO] - bf16 : False [2023-12-26 17:42:21,269] [ INFO] - bf16_full_eval : False [2023-12-26 17:42:21,269] [ INFO] - current_device : gpu:0 [2023-12-26 17:42:21,269] [ INFO] - data_parallel_rank : 0 [2023-12-26 17:42:21,269] [ INFO] - dataloader_drop_last : False [2023-12-26 17:42:21,269] [ INFO] - dataloader_num_workers : 0 [2023-12-26 17:42:21,269] [ INFO] - dataset_rank : 0 [2023-12-26 17:42:21,269] [ INFO] - dataset_world_size : 1 [2023-12-26 17:42:21,269] [ INFO] - device : gpu [2023-12-26 17:42:21,269] [ INFO] - disable_tqdm : True [2023-12-26 17:42:21,269] [ INFO] - distributed_dataloader : False [2023-12-26 17:42:21,269] [ INFO] - do_eval : True [2023-12-26 17:42:21,269] [ INFO] - do_export : False [2023-12-26 17:42:21,269] [ INFO] - do_predict : False [2023-12-26 17:42:21,269] [ INFO] - do_train : True [2023-12-26 17:42:21,269] [ INFO] - eval_accumulation_steps : 16 [2023-12-26 17:42:21,270] [ INFO] - eval_batch_size : 8 [2023-12-26 17:42:21,270] [ INFO] - eval_steps : None [2023-12-26 17:42:21,270] [ INFO] - evaluation_strategy : IntervalStrategy.EPOCH [2023-12-26 17:42:21,270] [ INFO] - flatten_param_grads : False [2023-12-26 17:42:21,270] [ INFO] - force_reshard_pp : False [2023-12-26 17:42:21,270] [ INFO] - fp16 : True [2023-12-26 17:42:21,270] [ INFO] - fp16_full_eval : False [2023-12-26 17:42:21,270] [ INFO] - fp16_opt_level : O2 [2023-12-26 17:42:21,270] [ INFO] - gradient_accumulation_steps : 4 [2023-12-26 17:42:21,270] [ INFO] - greater_is_better : True [2023-12-26 17:42:21,270] [ INFO] - hybrid_parallel_topo_order : None [2023-12-26 17:42:21,270] [ INFO] - ignore_data_skip : False [2023-12-26 17:42:21,270] [ INFO] - ignore_load_lr_and_optim : False [2023-12-26 17:42:21,270] [ INFO] - label_names : None [2023-12-26 17:42:21,270] [ INFO] - lazy_data_processing : True [2023-12-26 17:42:21,270] [ INFO] - learning_rate : 0.0003 [2023-12-26 17:42:21,270] [ INFO] - load_best_model_at_end : True [2023-12-26 17:42:21,270] [ INFO] - load_sharded_model : False [2023-12-26 17:42:21,270] [ INFO] - local_process_index : 0 [2023-12-26 17:42:21,271] [ INFO] - local_rank : -1 [2023-12-26 17:42:21,271] [ INFO] - log_level : -1 [2023-12-26 17:42:21,271] [ INFO] - log_level_replica : -1 [2023-12-26 17:42:21,271] [ INFO] - log_on_each_node : True [2023-12-26 17:42:21,271] [ INFO] - logging_dir : ./checkpoints/llama_lora_ckpts/runs/Dec26_17-37-10_jupyter-3484865-7331292 [2023-12-26 17:42:21,271] [ INFO] - logging_first_step : False [2023-12-26 17:42:21,271] [ INFO] - logging_steps : 1 [2023-12-26 17:42:21,271] [ INFO] - logging_strategy : IntervalStrategy.STEPS [2023-12-26 17:42:21,271] [ INFO] - logical_process_index : 0 [2023-12-26 17:42:21,271] [ INFO] - lr_end : 1e-07 [2023-12-26 17:42:21,271] [ INFO] - lr_scheduler_type : SchedulerType.LINEAR [2023-12-26 17:42:21,271] [ INFO] - max_evaluate_steps : -1 [2023-12-26 17:42:21,271] [ INFO] - max_grad_norm : 1.0 [2023-12-26 17:42:21,271] [ INFO] - max_steps : -1 [2023-12-26 17:42:21,271] [ INFO] - metric_for_best_model : accuracy [2023-12-26 17:42:21,271] [ INFO] - minimum_eval_times : None [2023-12-26 17:42:21,271] [ INFO] - no_cuda : False [2023-12-26 17:42:21,271] [ INFO] - num_cycles : 0.5 [2023-12-26 17:42:21,272] [ INFO] - num_train_epochs : 3 [2023-12-26 17:42:21,272] [ INFO] - optim : OptimizerNames.ADAMW [2023-12-26 17:42:21,272] [ INFO] - optimizer_name_suffix : None [2023-12-26 17:42:21,272] [ INFO] - output_dir : ./checkpoints/llama_lora_ckpts [2023-12-26 17:42:21,272] [ INFO] - overwrite_output_dir : False [2023-12-26 17:42:21,272] [ INFO] - past_index : -1 [2023-12-26 17:42:21,272] [ INFO] - per_device_eval_batch_size : 8 [2023-12-26 17:42:21,272] [ INFO] - per_device_train_batch_size : 4 [2023-12-26 17:42:21,272] [ INFO] - pipeline_parallel_config : [2023-12-26 17:42:21,272] [ INFO] - pipeline_parallel_degree : -1 [2023-12-26 17:42:21,272] [ INFO] - pipeline_parallel_rank : 0 [2023-12-26 17:42:21,272] [ INFO] - power : 1.0 [2023-12-26 17:42:21,272] [ INFO] - prediction_loss_only : False [2023-12-26 17:42:21,272] [ INFO] - process_index : 0 [2023-12-26 17:42:21,272] [ INFO] - recompute : True [2023-12-26 17:42:21,272] [ INFO] - remove_unused_columns : True [2023-12-26 17:42:21,272] [ INFO] - report_to : ['visualdl'] [2023-12-26 17:42:21,272] [ INFO] - resume_from_checkpoint : None [2023-12-26 17:42:21,273] [ INFO] - run_name : ./checkpoints/llama_lora_ckpts [2023-12-26 17:42:21,273] [ INFO] - save_on_each_node : False [2023-12-26 17:42:21,273] [ INFO] - save_sharded_model : False [2023-12-26 17:42:21,273] [ INFO] - save_steps : 500 [2023-12-26 17:42:21,273] [ INFO] - save_strategy : IntervalStrategy.EPOCH [2023-12-26 17:42:21,273] [ INFO] - save_total_limit : 1 [2023-12-26 17:42:21,273] [ INFO] - scale_loss : 32768 [2023-12-26 17:42:21,273] [ INFO] - seed : 42 [2023-12-26 17:42:21,273] [ INFO] - sep_parallel_degree : -1 [2023-12-26 17:42:21,273] [ INFO] - sharding : [] [2023-12-26 17:42:21,273] [ INFO] - sharding_degree : -1 [2023-12-26 17:42:21,273] [ INFO] - sharding_parallel_config : [2023-12-26 17:42:21,273] [ INFO] - sharding_parallel_degree : -1 [2023-12-26 17:42:21,273] [ INFO] - sharding_parallel_rank : 0 [2023-12-26 17:42:21,273] [ INFO] - should_load_dataset : True [2023-12-26 17:42:21,273] [ INFO] - should_load_sharding_stage1_model: False [2023-12-26 17:42:21,273] [ INFO] - should_log : True [2023-12-26 17:42:21,273] [ INFO] - should_save : True [2023-12-26 17:42:21,273] [ INFO] - should_save_model_state : True [2023-12-26 17:42:21,274] [ INFO] - should_save_sharding_stage1_model: False [2023-12-26 17:42:21,274] [ INFO] - skip_memory_metrics : True [2023-12-26 17:42:21,274] [ INFO] - skip_profile_timer : True [2023-12-26 17:42:21,274] [ INFO] - tensor_parallel_config : [2023-12-26 17:42:21,274] [ INFO] - tensor_parallel_degree : -1 [2023-12-26 17:42:21,274] [ INFO] - tensor_parallel_rank : 0 [2023-12-26 17:42:21,274] [ INFO] - to_static : False [2023-12-26 17:42:21,274] [ INFO] - train_batch_size : 4 [2023-12-26 17:42:21,274] [ INFO] - unified_checkpoint : False [2023-12-26 17:42:21,274] [ INFO] - use_auto_parallel : False [2023-12-26 17:42:21,274] [ INFO] - use_hybrid_parallel : False [2023-12-26 17:42:21,274] [ INFO] - warmup_ratio : 0.0 [2023-12-26 17:42:21,274] [ INFO] - warmup_steps : 30 [2023-12-26 17:42:21,274] [ INFO] - weight_decay : 0.0 [2023-12-26 17:42:21,274] [ INFO] - weight_name_suffix : None [2023-12-26 17:42:21,274] [ INFO] - world_size : 1 [2023-12-26 17:42:21,274] [ INFO] - [2023-12-26 17:42:21,274] [ INFO] - Starting training from resume_from_checkpoint : None /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/distributed/parallel.py:411: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card. warnings.warn( [2023-12-26 17:42:21,280] [ INFO] - ***** Running training ***** [2023-12-26 17:42:21,280] [ INFO] - Num examples = 1,848 [2023-12-26 17:42:21,280] [ INFO] - Num Epochs = 3 [2023-12-26 17:42:21,281] [ INFO] - Instantaneous batch size per device = 4 [2023-12-26 17:42:21,281] [ INFO] - Total train batch size (w. parallel, distributed & accumulation) = 16 [2023-12-26 17:42:21,281] [ INFO] - Gradient Accumulation steps = 4 [2023-12-26 17:42:21,281] [ INFO] - Total optimization steps = 345 [2023-12-26 17:42:21,281] [ INFO] - Total num train samples = 5,544 [2023-12-26 17:42:21,285] [ INFO] - Number of trainable parameters = 19,988,480 (per device) [2023-12-26 17:42:24,950] [ INFO] - loss: 4.04572821, learning_rate: 1e-05, global_step: 1, interval_runtime: 3.6647, interval_samples_per_second: 4.365975805930098, interval_steps_per_second: 0.27287348787063115, ppl: 57.15279011734303, epoch: 0.0087 [2023-12-26 17:42:28,307] [ INFO] - loss: 4.65756416, learning_rate: 2e-05, global_step: 2, interval_runtime: 3.3567, interval_samples_per_second: 4.766609671416571, interval_steps_per_second: 0.2979131044635357, ppl: 105.37908269415595, epoch: 0.0173 [2023-12-26 17:42:31,806] [ INFO] - loss: 4.39336109, learning_rate: 3e-05, global_step: 3, interval_runtime: 3.4991, interval_samples_per_second: 4.572590091463571, interval_steps_per_second: 0.2857868807164732, ppl: 80.91191469147887, epoch: 0.026 [2023-12-26 17:42:35,023] [ INFO] - loss: 4.2095747, learning_rate: 4e-05, global_step: 4, interval_runtime: 3.2167, interval_samples_per_second: 4.974087316240644, interval_steps_per_second: 0.31088045726504027, ppl: 67.32789916460032, epoch: 0.0346 [2023-12-26 17:42:38,280] [ INFO] - loss: 4.45522022, learning_rate: 5e-05, global_step: 5, interval_runtime: 3.2576, interval_samples_per_second: 4.911592329911189, interval_steps_per_second: 0.3069745206194493, ppl: 86.07510421755674, epoch: 0.0433 [2023-12-26 17:42:42,170] [ INFO] - loss: 4.172194, learning_rate: 6e-05, global_step: 6, interval_runtime: 3.89, interval_samples_per_second: 4.113123033615344, interval_steps_per_second: 0.257070189600959, ppl: 64.85759368161546, epoch: 0.0519 [2023-12-26 17:42:45,784] [ INFO] - loss: 3.75121832, learning_rate: 7e-05, global_step: 7, interval_runtime: 3.6141, interval_samples_per_second: 4.427085699961078, interval_steps_per_second: 0.2766928562475674, ppl: 42.57291785460257, epoch: 0.0606 [2023-12-26 17:42:49,598] [ INFO] - loss: 3.57292008, learning_rate: 8e-05, global_step: 8, interval_runtime: 3.8137, interval_samples_per_second: 4.195439390785131, interval_steps_per_second: 0.2622149619240707, ppl: 35.62045601509179, epoch: 0.0693 [2023-12-26 17:42:52,626] [ INFO] - loss: 3.01917839, learning_rate: 9e-05, global_step: 9, interval_runtime: 3.028, interval_samples_per_second: 5.2839536578947826, interval_steps_per_second: 0.3302471036184239, ppl: 20.47446274838995, epoch: 0.0779 [2023-12-26 17:42:55,893] [ INFO] - loss: 2.75773215, learning_rate: 0.0001, global_step: 10, interval_runtime: 3.2669, interval_samples_per_second: 4.897542422550416, interval_steps_per_second: 0.306096401409401, ppl: 15.764051874163764, epoch: 0.0866 [2023-12-26 17:42:59,152] [ INFO] - loss: 2.5989778, learning_rate: 0.00011, global_step: 11, interval_runtime: 3.2584, interval_samples_per_second: 4.9104358222759465, interval_steps_per_second: 0.30690223889224666, ppl: 13.449982433667916, epoch: 0.0952 [2023-12-26 17:43:02,654] [ INFO] - loss: 2.21501446, learning_rate: 0.00012, global_step: 12, interval_runtime: 3.5021, interval_samples_per_second: 4.5686416499291065, interval_steps_per_second: 0.28554010312056916, ppl: 9.161541586542349, epoch: 0.1039 [2023-12-26 17:43:05,641] [ INFO] - loss: 2.03604507, learning_rate: 0.00013, global_step: 13, interval_runtime: 2.9872, interval_samples_per_second: 5.356214143645703, interval_steps_per_second: 0.33476338397785643, ppl: 7.660253444848598, epoch: 0.1126 [2023-12-26 17:43:09,544] [ INFO] - loss: 1.90918612, learning_rate: 0.00014, global_step: 14, interval_runtime: 3.9035, interval_samples_per_second: 4.09884021807836, interval_steps_per_second: 0.2561775136298975, ppl: 6.7475948306385, epoch: 0.1212 [2023-12-26 17:43:13,583] [ INFO] - loss: 1.76850057, learning_rate: 0.00015, global_step: 15, interval_runtime: 4.0385, interval_samples_per_second: 3.9618929230574502, interval_steps_per_second: 0.24761830769109064, ppl: 5.862057024120994, epoch: 0.1299 [2023-12-26 17:43:17,162] [ INFO] - loss: 1.59435368, learning_rate: 0.00016, global_step: 16, interval_runtime: 3.5791, interval_samples_per_second: 4.470455789035147, interval_steps_per_second: 0.2794034868146967, ppl: 4.925144823605828, epoch: 0.1385 [2023-12-26 17:43:20,310] [ INFO] - loss: 1.55910134, learning_rate: 0.00017, global_step: 17, interval_runtime: 3.1482, interval_samples_per_second: 5.082190660424854, interval_steps_per_second: 0.3176369162765534, ppl: 4.754546603849948, epoch: 0.1472 [2023-12-26 17:43:24,120] [ INFO] - loss: 1.37142038, learning_rate: 0.00018, global_step: 18, interval_runtime: 3.8102, interval_samples_per_second: 4.199265432108036, interval_steps_per_second: 0.26245408950675225, ppl: 3.940944360515859, epoch: 0.1558 [2023-12-26 17:43:27,430] [ INFO] - loss: 1.26009345, learning_rate: 0.00019, global_step: 19, interval_runtime: 3.3092, interval_samples_per_second: 4.835006761991377, interval_steps_per_second: 0.30218792262446104, ppl: 3.5257509533974374, epoch: 0.1645 [2023-12-26 17:43:30,616] [ INFO] - loss: 1.38265204, learning_rate: 0.0002, global_step: 20, interval_runtime: 3.1861, interval_samples_per_second: 5.021890906010552, interval_steps_per_second: 0.3138681816256595, ppl: 3.985457216342121, epoch: 0.1732 [2023-12-26 17:43:34,073] [ INFO] - loss: 1.34700322, learning_rate: 0.00021, global_step: 21, interval_runtime: 3.4576, interval_samples_per_second: 4.627547107516889, interval_steps_per_second: 0.28922169421980554, ppl: 3.845882978897622, epoch: 0.1818 [2023-12-26 17:43:37,613] [ INFO] - loss: 0.96020913, learning_rate: 0.00022, global_step: 22, interval_runtime: 3.5402, interval_samples_per_second: 4.519494023607096, interval_steps_per_second: 0.2824683764754435, ppl: 2.6122427146223246, epoch: 0.1905 [2023-12-26 17:43:41,826] [ INFO] - loss: 0.81633461, learning_rate: 0.00023, global_step: 23, interval_runtime: 4.2123, interval_samples_per_second: 3.798402662854955, interval_steps_per_second: 0.2374001664284347, ppl: 2.262192803692768, epoch: 0.1991 [2023-12-26 17:43:46,109] [ INFO] - loss: 0.92583209, learning_rate: 0.00024, global_step: 24, interval_runtime: 4.2829, interval_samples_per_second: 3.735805341596222, interval_steps_per_second: 0.23348783384976388, ppl: 2.523967554998823, epoch: 0.2078 [2023-12-26 17:43:49,823] [ INFO] - loss: 0.97212011, learning_rate: 0.00025, global_step: 25, interval_runtime: 3.7141, interval_samples_per_second: 4.307892351580294, interval_steps_per_second: 0.26924327197376835, ppl: 2.643543124577833, epoch: 0.2165 [2023-12-26 17:43:53,004] [ INFO] - loss: 0.68968725, learning_rate: 0.00026, global_step: 26, interval_runtime: 3.1813, interval_samples_per_second: 5.029367627441282, interval_steps_per_second: 0.3143354767150801, ppl: 1.9930920962051093, epoch: 0.2251 [2023-12-26 17:43:56,862] [ INFO] - loss: 0.75413704, learning_rate: 0.00027, global_step: 27, interval_runtime: 3.8575, interval_samples_per_second: 4.1477499141974175, interval_steps_per_second: 0.2592343696373386, ppl: 2.1257762717033786, epoch: 0.2338 [2023-12-26 17:43:59,763] [ INFO] - loss: 0.63414562, learning_rate: 0.00028, global_step: 28, interval_runtime: 2.9012, interval_samples_per_second: 5.514889879483934, interval_steps_per_second: 0.34468061746774586, ppl: 1.885410596015647, epoch: 0.2424 [2023-12-26 17:44:03,739] [ INFO] - loss: 0.6446268, learning_rate: 0.00029, global_step: 29, interval_runtime: 3.9758, interval_samples_per_second: 4.024363191163546, interval_steps_per_second: 0.2515226994477216, ppl: 1.9052758476273477, epoch: 0.2511 [2023-12-26 17:44:07,650] [ INFO] - loss: 0.63658696, learning_rate: 0.0003, global_step: 30, interval_runtime: 3.9109, interval_samples_per_second: 4.091122564876725, interval_steps_per_second: 0.25569516030479533, ppl: 1.8900191475517596, epoch: 0.2597 [2023-12-26 17:44:11,223] [ INFO] - loss: 0.56769204, learning_rate: 0.000299, global_step: 31, interval_runtime: 3.5735, interval_samples_per_second: 4.477359595438927, interval_steps_per_second: 0.27983497471493296, ppl: 1.7641906676844925, epoch: 0.2684 [2023-12-26 17:44:14,480] [ INFO] - loss: 0.51316339, learning_rate: 0.0002981, global_step: 32, interval_runtime: 3.2572, interval_samples_per_second: 4.912153169838064, interval_steps_per_second: 0.307009573114879, ppl: 1.6705675015668517, epoch: 0.2771 [2023-12-26 17:44:17,453] [ INFO] - loss: 0.54714298, learning_rate: 0.0002971, global_step: 33, interval_runtime: 2.9726, interval_samples_per_second: 5.382444256551639, interval_steps_per_second: 0.33640276603447744, ppl: 1.7283081464920642, epoch: 0.2857 [2023-12-26 17:44:20,106] [ INFO] - loss: 0.5057705, learning_rate: 0.0002962, global_step: 34, interval_runtime: 2.6522, interval_samples_per_second: 6.03280133930277, interval_steps_per_second: 0.3770500837064231, ppl: 1.6582627197822182, epoch: 0.2944 [2023-12-26 17:44:22,814] [ INFO] - loss: 0.44090223, learning_rate: 0.0002952, global_step: 35, interval_runtime: 2.7086, interval_samples_per_second: 5.907133986294971, interval_steps_per_second: 0.3691958741434357, ppl: 1.5541087497017636, epoch: 0.303 [2023-12-26 17:44:26,301] [ INFO] - loss: 0.41087997, learning_rate: 0.0002943, global_step: 36, interval_runtime: 3.4869, interval_samples_per_second: 4.588559640353671, interval_steps_per_second: 0.2867849775221044, ppl: 1.508144323130449, epoch: 0.3117 [2023-12-26 17:44:29,741] [ INFO] - loss: 0.36454743, learning_rate: 0.0002933, global_step: 37, interval_runtime: 3.4403, interval_samples_per_second: 4.65081652284074, interval_steps_per_second: 0.29067603267754627, ppl: 1.439862222225052, epoch: 0.3203 [2023-12-26 17:44:33,123] [ INFO] - loss: 0.34224176, learning_rate: 0.0002924, global_step: 38, interval_runtime: 3.3821, interval_samples_per_second: 4.730738726062611, interval_steps_per_second: 0.29567117037891316, ppl: 1.40810067878726, epoch: 0.329 [2023-12-26 17:44:36,322] [ INFO] - loss: 0.40164879, learning_rate: 0.0002914, global_step: 39, interval_runtime: 3.1989, interval_samples_per_second: 5.001789822888476, interval_steps_per_second: 0.31261186393052975, ppl: 1.4942864321684426, epoch: 0.3377 [2023-12-26 17:44:39,941] [ INFO] - loss: 0.34734392, learning_rate: 0.0002905, global_step: 40, interval_runtime: 3.6195, interval_samples_per_second: 4.420510938298477, interval_steps_per_second: 0.27628193364365483, ppl: 1.415303392821156, epoch: 0.3463 [2023-12-26 17:44:43,331] [ INFO] - loss: 0.34797683, learning_rate: 0.0002895, global_step: 41, interval_runtime: 3.3899, interval_samples_per_second: 4.719858802727873, interval_steps_per_second: 0.29499117517049206, ppl: 1.4161994360189456, epoch: 0.355 [2023-12-26 17:44:46,994] [ INFO] - loss: 0.3465372, learning_rate: 0.0002886, global_step: 42, interval_runtime: 3.6628, interval_samples_per_second: 4.368268092655978, interval_steps_per_second: 0.2730167557909986, ppl: 1.4141620996819957, epoch: 0.3636 [2023-12-26 17:44:50,066] [ INFO] - loss: 0.35139573, learning_rate: 0.0002876, global_step: 43, interval_runtime: 3.0719, interval_samples_per_second: 5.20858310791659, interval_steps_per_second: 0.3255364442447869, ppl: 1.4210495666020952, epoch: 0.3723 [2023-12-26 17:44:53,690] [ INFO] - loss: 0.3359192, learning_rate: 0.0002867, global_step: 44, interval_runtime: 3.6242, interval_samples_per_second: 4.414716444232063, interval_steps_per_second: 0.27591977776450394, ppl: 1.3992259627854928, epoch: 0.381 [2023-12-26 17:44:57,252] [ INFO] - loss: 0.332609, learning_rate: 0.0002857, global_step: 45, interval_runtime: 3.5619, interval_samples_per_second: 4.492029541467871, interval_steps_per_second: 0.28075184634174194, ppl: 1.3946019025079608, epoch: 0.3896 [2023-12-26 17:45:00,691] [ INFO] - loss: 0.30754462, learning_rate: 0.0002848, global_step: 46, interval_runtime: 3.4393, interval_samples_per_second: 4.6521041994702985, interval_steps_per_second: 0.29075651246689366, ppl: 1.360081493984319, epoch: 0.3983 [2023-12-26 17:45:04,107] [ INFO] - loss: 0.2745533, learning_rate: 0.0002838, global_step: 47, interval_runtime: 3.4151, interval_samples_per_second: 4.685137446638551, interval_steps_per_second: 0.29282109041490945, ppl: 1.3159427119464535, epoch: 0.4069 [2023-12-26 17:45:07,275] [ INFO] - loss: 0.33314824, learning_rate: 0.0002829, global_step: 48, interval_runtime: 3.168, interval_samples_per_second: 5.050457186622707, interval_steps_per_second: 0.3156535741639192, ppl: 1.3953541304353352, epoch: 0.4156 [2023-12-26 17:45:11,373] [ INFO] - loss: 0.30882508, learning_rate: 0.0002819, global_step: 49, interval_runtime: 4.0984, interval_samples_per_second: 3.903920851275084, interval_steps_per_second: 0.24399505320469275, ppl: 1.361824139389874, epoch: 0.4242 [2023-12-26 17:45:14,388] [ INFO] - loss: 0.30945367, learning_rate: 0.000281, global_step: 50, interval_runtime: 3.0149, interval_samples_per_second: 5.306904905470886, interval_steps_per_second: 0.3316815565919304, ppl: 1.3626804375276809, epoch: 0.4329 [2023-12-26 17:45:17,930] [ INFO] - loss: 0.29789728, learning_rate: 0.00028, global_step: 51, interval_runtime: 3.5425, interval_samples_per_second: 4.5166080190902775, interval_steps_per_second: 0.28228800119314235, ppl: 1.34702341452768, epoch: 0.4416 [2023-12-26 17:45:21,610] [ INFO] - loss: 0.28248543, learning_rate: 0.000279, global_step: 52, interval_runtime: 3.6793, interval_samples_per_second: 4.348656011857886, interval_steps_per_second: 0.2717910007411179, ppl: 1.3264224489808771, epoch: 0.4502 [2023-12-26 17:45:25,480] [ INFO] - loss: 0.28851509, learning_rate: 0.0002781, global_step: 53, interval_runtime: 3.8697, interval_samples_per_second: 4.134670933214908, interval_steps_per_second: 0.25841693332593174, ppl: 1.3344444861382638, epoch: 0.4589 [2023-12-26 17:45:28,597] [ INFO] - loss: 0.26777136, learning_rate: 0.0002771, global_step: 54, interval_runtime: 3.1172, interval_samples_per_second: 5.132893716860017, interval_steps_per_second: 0.32080585730375105, ppl: 1.3070482623338415, epoch: 0.4675 [2023-12-26 17:45:31,972] [ INFO] - loss: 0.30797887, learning_rate: 0.0002762, global_step: 55, interval_runtime: 3.3754, interval_samples_per_second: 4.740216348520788, interval_steps_per_second: 0.2962635217825493, ppl: 1.3606722376290126, epoch: 0.4762 [2023-12-26 17:45:35,789] [ INFO] - loss: 0.25563985, learning_rate: 0.0002752, global_step: 56, interval_runtime: 3.8169, interval_samples_per_second: 4.191847577319691, interval_steps_per_second: 0.2619904735824807, ppl: 1.2912875869600262, epoch: 0.4848 [2023-12-26 17:45:39,008] [ INFO] - loss: 0.27637056, learning_rate: 0.0002743, global_step: 57, interval_runtime: 3.2185, interval_samples_per_second: 4.971223970727729, interval_steps_per_second: 0.31070149817048304, ppl: 1.3183362962229255, epoch: 0.4935 [2023-12-26 17:45:42,650] [ INFO] - loss: 0.29341272, learning_rate: 0.0002733, global_step: 58, interval_runtime: 3.63, interval_samples_per_second: 4.407697726782218, interval_steps_per_second: 0.27548110792388864, ppl: 1.3409961321598929, epoch: 0.5022 [2023-12-26 17:45:46,185] [ INFO] - loss: 0.2738843, learning_rate: 0.0002724, global_step: 59, interval_runtime: 3.5479, interval_samples_per_second: 4.50970456331901, interval_steps_per_second: 0.28185653520743814, ppl: 1.3150626406888208, epoch: 0.5108 [2023-12-26 17:45:49,729] [ INFO] - loss: 0.29272103, learning_rate: 0.0002714, global_step: 60, interval_runtime: 3.5435, interval_samples_per_second: 4.51530705583338, interval_steps_per_second: 0.28220669098958623, ppl: 1.3400688992610694, epoch: 0.5195 [2023-12-26 17:45:53,090] [ INFO] - loss: 0.24711998, learning_rate: 0.0002705, global_step: 61, interval_runtime: 3.3613, interval_samples_per_second: 4.76002560569709, interval_steps_per_second: 0.29750160035606815, ppl: 1.280332717882807, epoch: 0.5281 [2023-12-26 17:45:56,060] [ INFO] - loss: 0.27978322, learning_rate: 0.0002695, global_step: 62, interval_runtime: 2.9697, interval_samples_per_second: 5.387787058462534, interval_steps_per_second: 0.33673669115390836, ppl: 1.3228430153437678, epoch: 0.5368 [2023-12-26 17:46:00,180] [ INFO] - loss: 0.26979572, learning_rate: 0.0002686, global_step: 63, interval_runtime: 4.1194, interval_samples_per_second: 3.8840582897572737, interval_steps_per_second: 0.2427536431098296, ppl: 1.3096968785260072, epoch: 0.5455 [2023-12-26 17:46:03,959] [ INFO] - loss: 0.2797547, learning_rate: 0.0002676, global_step: 64, interval_runtime: 3.7798, interval_samples_per_second: 4.233009314355052, interval_steps_per_second: 0.26456308214719076, ppl: 1.322805288398959, epoch: 0.5541 [2023-12-26 17:46:07,112] [ INFO] - loss: 0.26245928, learning_rate: 0.0002667, global_step: 65, interval_runtime: 3.1532, interval_samples_per_second: 5.0742238975114935, interval_steps_per_second: 0.31713899359446834, ppl: 1.3001235260606159, epoch: 0.5628 [2023-12-26 17:46:10,654] [ INFO] - loss: 0.27325338, learning_rate: 0.0002657, global_step: 66, interval_runtime: 3.5412, interval_samples_per_second: 4.518302126492652, interval_steps_per_second: 0.28239388290579076, ppl: 1.314233203049469, epoch: 0.5714 [2023-12-26 17:46:13,738] [ INFO] - loss: 0.29552907, learning_rate: 0.0002648, global_step: 67, interval_runtime: 3.0848, interval_samples_per_second: 5.186665460984193, interval_steps_per_second: 0.32416659131151204, ppl: 1.343837154562674, epoch: 0.5801 [2023-12-26 17:46:16,404] [ INFO] - loss: 0.27431649, learning_rate: 0.0002638, global_step: 68, interval_runtime: 2.665, interval_samples_per_second: 6.003694960613001, interval_steps_per_second: 0.37523093503831256, ppl: 1.3156311204482851, epoch: 0.5887 [2023-12-26 17:46:19,818] [ INFO] - loss: 0.30101836, learning_rate: 0.0002629, global_step: 69, interval_runtime: 3.4145, interval_samples_per_second: 4.685920299323026, interval_steps_per_second: 0.29287001870768914, ppl: 1.351234149969267, epoch: 0.5974 [2023-12-26 17:46:23,446] [ INFO] - loss: 0.26293159, learning_rate: 0.0002619, global_step: 70, interval_runtime: 3.6285, interval_samples_per_second: 4.409583156907597, interval_steps_per_second: 0.2755989473067248, ppl: 1.3007377324396991, epoch: 0.6061 [2023-12-26 17:46:26,805] [ INFO] - loss: 0.25193915, learning_rate: 0.000261, global_step: 71, interval_runtime: 3.3583, interval_samples_per_second: 4.764344747034641, interval_steps_per_second: 0.29777154668966505, ppl: 1.2865177502978775, epoch: 0.6147 [2023-12-26 17:46:29,961] [ INFO] - loss: 0.26743451, learning_rate: 0.00026, global_step: 72, interval_runtime: 3.1567, interval_samples_per_second: 5.068537361968574, interval_steps_per_second: 0.3167835851230359, ppl: 1.3066080572723744, epoch: 0.6234 [2023-12-26 17:46:33,563] [ INFO] - loss: 0.26232645, learning_rate: 0.000259, global_step: 73, interval_runtime: 3.6013, interval_samples_per_second: 4.4428989111833515, interval_steps_per_second: 0.27768118194895947, ppl: 1.299950842121707, epoch: 0.632 [2023-12-26 17:46:36,887] [ INFO] - loss: 0.28119218, learning_rate: 0.0002581, global_step: 74, interval_runtime: 3.3247, interval_samples_per_second: 4.812406825716142, interval_steps_per_second: 0.30077542660725887, ppl: 1.324708161888552, epoch: 0.6407 [2023-12-26 17:46:40,115] [ INFO] - loss: 0.27209201, learning_rate: 0.0002571, global_step: 75, interval_runtime: 3.2275, interval_samples_per_second: 4.957351649423985, interval_steps_per_second: 0.30983447808899905, ppl: 1.312707777997345, epoch: 0.6494 [2023-12-26 17:46:43,942] [ INFO] - loss: 0.27171448, learning_rate: 0.0002562, global_step: 76, interval_runtime: 3.8267, interval_samples_per_second: 4.181108421820734, interval_steps_per_second: 0.26131927636379587, ppl: 1.3122122849675446, epoch: 0.658 [2023-12-26 17:46:47,543] [ INFO] - loss: 0.27416253, learning_rate: 0.0002552, global_step: 77, interval_runtime: 3.6017, interval_samples_per_second: 4.442374228735902, interval_steps_per_second: 0.2776483892959939, ppl: 1.3154285814728313, epoch: 0.6667 [2023-12-26 17:46:51,255] [ INFO] - loss: 0.2439681, learning_rate: 0.0002543, global_step: 78, interval_runtime: 3.7116, interval_samples_per_second: 4.310802079060973, interval_steps_per_second: 0.2694251299413108, ppl: 1.2763036157547154, epoch: 0.6753 [2023-12-26 17:46:54,776] [ INFO] - loss: 0.27000949, learning_rate: 0.0002533, global_step: 79, interval_runtime: 3.5207, interval_samples_per_second: 4.544553290140272, interval_steps_per_second: 0.284034580633767, ppl: 1.3099768823548728, epoch: 0.684 [2023-12-26 17:46:57,390] [ INFO] - loss: 0.28652525, learning_rate: 0.0002524, global_step: 80, interval_runtime: 2.6142, interval_samples_per_second: 6.120512756437044, interval_steps_per_second: 0.38253204727731527, ppl: 1.3317917952124918, epoch: 0.6926 [2023-12-26 17:47:00,768] [ INFO] - loss: 0.25104171, learning_rate: 0.0002514, global_step: 81, interval_runtime: 3.3783, interval_samples_per_second: 4.736054461807919, interval_steps_per_second: 0.29600340386299495, ppl: 1.2853636957328707, epoch: 0.7013 [2023-12-26 17:47:03,697] [ INFO] - loss: 0.2385323, learning_rate: 0.0002505, global_step: 82, interval_runtime: 2.9293, interval_samples_per_second: 5.462112555607908, interval_steps_per_second: 0.3413820347254943, ppl: 1.269384706500266, epoch: 0.71 [2023-12-26 17:47:07,113] [ INFO] - loss: 0.24549332, learning_rate: 0.0002495, global_step: 83, interval_runtime: 3.4153, interval_samples_per_second: 4.684774406946456, interval_steps_per_second: 0.2927984004341535, ppl: 1.2782517448405986, epoch: 0.7186 [2023-12-26 17:47:10,832] [ INFO] - loss: 0.22970712, learning_rate: 0.0002486, global_step: 84, interval_runtime: 3.7191, interval_samples_per_second: 4.302069220702643, interval_steps_per_second: 0.2688793262939152, ppl: 1.258231445133833, epoch: 0.7273 [2023-12-26 17:47:14,400] [ INFO] - loss: 0.25397247, learning_rate: 0.0002476, global_step: 85, interval_runtime: 3.568, interval_samples_per_second: 4.484274570278145, interval_steps_per_second: 0.28026716064238405, ppl: 1.2891363138565606, epoch: 0.7359 [2023-12-26 17:47:17,574] [ INFO] - loss: 0.2352851, learning_rate: 0.0002467, global_step: 86, interval_runtime: 3.1738, interval_samples_per_second: 5.041209940914457, interval_steps_per_second: 0.31507562130715355, ppl: 1.2652694456349067, epoch: 0.7446 [2023-12-26 17:47:20,941] [ INFO] - loss: 0.27693576, learning_rate: 0.0002457, global_step: 87, interval_runtime: 3.3667, interval_samples_per_second: 4.752486174464305, interval_steps_per_second: 0.2970303859040191, ppl: 1.3190816305091784, epoch: 0.7532 [2023-12-26 17:47:24,730] [ INFO] - loss: 0.31699374, learning_rate: 0.0002448, global_step: 88, interval_runtime: 3.7896, interval_samples_per_second: 4.222126759815494, interval_steps_per_second: 0.2638829224884684, ppl: 1.3729939769562607, epoch: 0.7619 [2023-12-26 17:47:27,797] [ INFO] - loss: 0.32770675, learning_rate: 0.0002438, global_step: 89, interval_runtime: 3.0667, interval_samples_per_second: 5.217395063161403, interval_steps_per_second: 0.3260871914475877, ppl: 1.3877819455565001, epoch: 0.7706 [2023-12-26 17:47:30,898] [ INFO] - loss: 0.22880366, learning_rate: 0.0002429, global_step: 90, interval_runtime: 3.1017, interval_samples_per_second: 5.158516492850441, interval_steps_per_second: 0.32240728080315256, ppl: 1.2570951967072017, epoch: 0.7792 [2023-12-26 17:47:34,383] [ INFO] - loss: 0.22428387, learning_rate: 0.0002419, global_step: 91, interval_runtime: 3.4843, interval_samples_per_second: 4.591995491543428, interval_steps_per_second: 0.28699971822146425, ppl: 1.2514262113704304, epoch: 0.7879 [2023-12-26 17:47:39,494] [ INFO] - loss: 0.26413378, learning_rate: 0.000241, global_step: 92, interval_runtime: 5.1112, interval_samples_per_second: 3.1303636372371897, interval_steps_per_second: 0.19564772732732436, ppl: 1.3023024066636666, epoch: 0.7965 [2023-12-26 17:47:43,067] [ INFO] - loss: 0.27162313, learning_rate: 0.00024, global_step: 93, interval_runtime: 3.5728, interval_samples_per_second: 4.478319289217104, interval_steps_per_second: 0.279894955576069, ppl: 1.3120924198502355, epoch: 0.8052 [2023-12-26 17:47:45,606] [ INFO] - loss: 0.26476258, learning_rate: 0.000239, global_step: 94, interval_runtime: 2.5391, interval_samples_per_second: 6.301347343074687, interval_steps_per_second: 0.3938342089421679, ppl: 1.303121551929258, epoch: 0.8139 [2023-12-26 17:47:48,807] [ INFO] - loss: 0.3001802, learning_rate: 0.0002381, global_step: 95, interval_runtime: 3.2009, interval_samples_per_second: 4.998657318996902, interval_steps_per_second: 0.3124160824373064, ppl: 1.3501020740507794, epoch: 0.8225 [2023-12-26 17:47:52,331] [ INFO] - loss: 0.2234904, learning_rate: 0.0002371, global_step: 96, interval_runtime: 3.5241, interval_samples_per_second: 4.540179100277009, interval_steps_per_second: 0.28376119376731307, ppl: 1.2504336360559385, epoch: 0.8312 [2023-12-26 17:47:55,521] [ INFO] - loss: 0.27073395, learning_rate: 0.0002362, global_step: 97, interval_runtime: 3.1898, interval_samples_per_second: 5.0159280647981275, interval_steps_per_second: 0.31349550404988297, ppl: 1.3109262520557279, epoch: 0.8398 [2023-12-26 17:47:59,074] [ INFO] - loss: 0.26731879, learning_rate: 0.0002352, global_step: 98, interval_runtime: 3.553, interval_samples_per_second: 4.503248218197377, interval_steps_per_second: 0.28145301363733605, ppl: 1.3064568653361208, epoch: 0.8485 [2023-12-26 17:48:03,000] [ INFO] - loss: 0.25308639, learning_rate: 0.0002343, global_step: 99, interval_runtime: 3.9263, interval_samples_per_second: 4.075121221879381, interval_steps_per_second: 0.2546950763674613, ppl: 1.2879945418769403, epoch: 0.8571 [2023-12-26 17:48:06,504] [ INFO] - loss: 0.2433984, learning_rate: 0.0002333, global_step: 100, interval_runtime: 3.5039, interval_samples_per_second: 4.566329732828105, interval_steps_per_second: 0.2853956083017566, ppl: 1.275576712662826, epoch: 0.8658 [2023-12-26 17:48:10,291] [ INFO] - loss: 0.26445276, learning_rate: 0.0002324, global_step: 101, interval_runtime: 3.7869, interval_samples_per_second: 4.2250563366120515, interval_steps_per_second: 0.2640660210382532, ppl: 1.3027178813458784, epoch: 0.8745 [2023-12-26 17:48:13,416] [ INFO] - loss: 0.24420807, learning_rate: 0.0002314, global_step: 102, interval_runtime: 3.1254, interval_samples_per_second: 5.119266901857392, interval_steps_per_second: 0.319954181366087, ppl: 1.2766099270846831, epoch: 0.8831 [2023-12-26 17:48:17,553] [ INFO] - loss: 0.2597208, learning_rate: 0.0002305, global_step: 103, interval_runtime: 4.1361, interval_samples_per_second: 3.8683560753211825, interval_steps_per_second: 0.2417722547075739, ppl: 1.2965680343304327, epoch: 0.8918 [2023-12-26 17:48:21,048] [ INFO] - loss: 0.24676055, learning_rate: 0.0002295, global_step: 104, interval_runtime: 3.4953, interval_samples_per_second: 4.577567099565918, interval_steps_per_second: 0.2860979437228699, ppl: 1.2798726105871545, epoch: 0.9004 [2023-12-26 17:48:24,069] [ INFO] - loss: 0.22971785, learning_rate: 0.0002286, global_step: 105, interval_runtime: 3.0213, interval_samples_per_second: 5.295789070145969, interval_steps_per_second: 0.33098681688412307, ppl: 1.2582449460296714, epoch: 0.9091 [2023-12-26 17:48:26,900] [ INFO] - loss: 0.25949037, learning_rate: 0.0002276, global_step: 106, interval_runtime: 2.8308, interval_samples_per_second: 5.652026202735068, interval_steps_per_second: 0.35325163767094175, ppl: 1.296269300578213, epoch: 0.9177 [2023-12-26 17:48:29,737] [ INFO] - loss: 0.26405108, learning_rate: 0.0002267, global_step: 107, interval_runtime: 2.8369, interval_samples_per_second: 5.639943002919188, interval_steps_per_second: 0.35249643768244926, ppl: 1.3021947107079246, epoch: 0.9264 [2023-12-26 17:48:32,962] [ INFO] - loss: 0.30479836, learning_rate: 0.0002257, global_step: 108, interval_runtime: 3.2252, interval_samples_per_second: 4.9609837270178225, interval_steps_per_second: 0.3100614829386139, ppl: 1.3563514807180617, epoch: 0.9351 [2023-12-26 17:48:36,223] [ INFO] - loss: 0.291565, learning_rate: 0.0002248, global_step: 109, interval_runtime: 3.2612, interval_samples_per_second: 4.906185363514207, interval_steps_per_second: 0.30663658521963794, ppl: 1.338520634504136, epoch: 0.9437 [2023-12-26 17:48:39,279] [ INFO] - loss: 0.22575521, learning_rate: 0.0002238, global_step: 110, interval_runtime: 3.0558, interval_samples_per_second: 5.236013050696715, interval_steps_per_second: 0.3272508156685447, ppl: 1.2532688400464898, epoch: 0.9524 [2023-12-26 17:48:43,107] [ INFO] - loss: 0.2475778, learning_rate: 0.0002229, global_step: 111, interval_runtime: 3.8281, interval_samples_per_second: 4.1796288008266975, interval_steps_per_second: 0.2612268000516686, ppl: 1.2809190140065132, epoch: 0.961 [2023-12-26 17:48:46,229] [ INFO] - loss: 0.25429082, learning_rate: 0.0002219, global_step: 112, interval_runtime: 3.1222, interval_samples_per_second: 5.124616267721022, interval_steps_per_second: 0.32028851673256387, ppl: 1.2895467757338794, epoch: 0.9697 [2023-12-26 17:48:49,501] [ INFO] - loss: 0.28627616, learning_rate: 0.000221, global_step: 113, interval_runtime: 3.2718, interval_samples_per_second: 4.890284026761988, interval_steps_per_second: 0.3056427516726242, ppl: 1.3314601005068545, epoch: 0.9784 [2023-12-26 17:48:52,786] [ INFO] - loss: 0.26187339, learning_rate: 0.00022, global_step: 114, interval_runtime: 3.2853, interval_samples_per_second: 4.870233688813434, interval_steps_per_second: 0.3043896055508396, ppl: 1.2993620197891702, epoch: 0.987 [2023-12-26 17:48:56,188] [ INFO] - loss: 0.26199824, learning_rate: 0.000219, global_step: 115, interval_runtime: 3.401, interval_samples_per_second: 4.704479187251886, interval_steps_per_second: 0.2940299492032429, ppl: 1.2995242552646797, epoch: 0.9957 [2023-12-26 17:48:57,346] [ INFO] - ***** Running Evaluation ***** [2023-12-26 17:48:57,354] [ INFO] - Num examples = 206 [2023-12-26 17:48:57,354] [ INFO] - Total prediction steps = 26 [2023-12-26 17:48:57,354] [ INFO] - Pre device batch size = 8 [2023-12-26 17:48:57,354] [ INFO] - Total Batch size = 8 [2023-12-26 17:49:11,883] [ INFO] - eval_loss: 0.25766491889953613, eval_accuracy: 0.9982474588152822, eval_runtime: 14.5363, eval_samples_per_second: 14.171448478326102, eval_steps_per_second: 1.7886294195945565, eval_ppl: 1.2939051828041612, epoch: 0.9957 [2023-12-26 17:49:11,884] [ INFO] - Saving model checkpoint to ./checkpoints/llama_lora_ckpts/checkpoint-115 [2023-12-26 17:49:11,885] [ INFO] - tokenizer config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-115/tokenizer_config.json [2023-12-26 17:49:11,885] [ INFO] - Special tokens file saved in ./checkpoints/llama_lora_ckpts/checkpoint-115/special_tokens_map.json [2023-12-26 17:49:11,887] [ INFO] - Chat-template config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-115/chat_template.json [2023-12-26 17:49:12,064] [ INFO] - Saving optimizer files. [2023-12-26 17:49:16,238] [ INFO] - loss: 0.42286861, learning_rate: 0.0002181, global_step: 116, interval_runtime: 20.0504, interval_samples_per_second: 0.7979880325215534, interval_steps_per_second: 0.049874252032597086, ppl: 1.526333737801267, epoch: 1.0087 [2023-12-26 17:49:19,708] [ INFO] - loss: 0.27453929, learning_rate: 0.0002171, global_step: 117, interval_runtime: 3.4699, interval_samples_per_second: 4.611077381935594, interval_steps_per_second: 0.28819233637097463, ppl: 1.3159242757182052, epoch: 1.0173 [2023-12-26 17:49:22,841] [ INFO] - loss: 0.23074579, learning_rate: 0.0002162, global_step: 118, interval_runtime: 3.1331, interval_samples_per_second: 5.106751557924221, interval_steps_per_second: 0.3191719723702638, ppl: 1.2595390113362899, epoch: 1.026 [2023-12-26 17:49:26,209] [ INFO] - loss: 0.24562941, learning_rate: 0.0002152, global_step: 119, interval_runtime: 3.368, interval_samples_per_second: 4.750556789404557, interval_steps_per_second: 0.29690979933778483, ppl: 1.2784257139580142, epoch: 1.0346 [2023-12-26 17:49:29,819] [ INFO] - loss: 0.24127965, learning_rate: 0.0002143, global_step: 120, interval_runtime: 3.6097, interval_samples_per_second: 4.432549666070322, interval_steps_per_second: 0.27703435412939514, ppl: 1.272876945578587, epoch: 1.0433 [2023-12-26 17:49:34,017] [ INFO] - loss: 0.26145872, learning_rate: 0.0002133, global_step: 121, interval_runtime: 4.1987, interval_samples_per_second: 3.8107239140857154, interval_steps_per_second: 0.23817024463035721, ppl: 1.2988233250384196, epoch: 1.0519 [2023-12-26 17:49:37,157] [ INFO] - loss: 0.24270561, learning_rate: 0.0002124, global_step: 122, interval_runtime: 3.1392, interval_samples_per_second: 5.096851230335854, interval_steps_per_second: 0.3185532018959909, ppl: 1.274693311912996, epoch: 1.0606 [2023-12-26 17:49:40,476] [ INFO] - loss: 0.24763, learning_rate: 0.0002114, global_step: 123, interval_runtime: 3.319, interval_samples_per_second: 4.820663331917016, interval_steps_per_second: 0.3012914582448135, ppl: 1.2809858797242244, epoch: 1.0693 [2023-12-26 17:49:43,025] [ INFO] - loss: 0.23295119, learning_rate: 0.0002105, global_step: 124, interval_runtime: 2.5489, interval_samples_per_second: 6.277124540034533, interval_steps_per_second: 0.3923202837521583, ppl: 1.2623198639909898, epoch: 1.0779 [2023-12-26 17:49:46,303] [ INFO] - loss: 0.27050617, learning_rate: 0.0002095, global_step: 125, interval_runtime: 3.2785, interval_samples_per_second: 4.88032202089331, interval_steps_per_second: 0.30502012630583186, ppl: 1.3106276832793233, epoch: 1.0866 [2023-12-26 17:49:49,566] [ INFO] - loss: 0.24398029, learning_rate: 0.0002086, global_step: 126, interval_runtime: 3.2631, interval_samples_per_second: 4.90330863858365, interval_steps_per_second: 0.3064567899114781, ppl: 1.2763191739906188, epoch: 1.0952 [2023-12-26 17:49:53,029] [ INFO] - loss: 0.24883732, learning_rate: 0.0002076, global_step: 127, interval_runtime: 3.4632, interval_samples_per_second: 4.620060157492262, interval_steps_per_second: 0.2887537598432664, ppl: 1.2825333735686955, epoch: 1.1039 [2023-12-26 17:49:56,814] [ INFO] - loss: 0.24923915, learning_rate: 0.0002067, global_step: 128, interval_runtime: 3.7849, interval_samples_per_second: 4.227313505925205, interval_steps_per_second: 0.26420709412032534, ppl: 1.2830488375116988, epoch: 1.1126 [2023-12-26 17:50:00,253] [ INFO] - loss: 0.25661612, learning_rate: 0.0002057, global_step: 129, interval_runtime: 3.4389, interval_samples_per_second: 4.6527112086549245, interval_steps_per_second: 0.2907944505409328, ppl: 1.29254884785796, epoch: 1.1212 [2023-12-26 17:50:03,993] [ INFO] - loss: 0.24869432, learning_rate: 0.0002048, global_step: 130, interval_runtime: 3.7395, interval_samples_per_second: 4.278694837629995, interval_steps_per_second: 0.2674184273518747, ppl: 1.2823499844089126, epoch: 1.1299 [2023-12-26 17:50:08,240] [ INFO] - loss: 0.26458281, learning_rate: 0.0002038, global_step: 131, interval_runtime: 4.2476, interval_samples_per_second: 3.7668204909298133, interval_steps_per_second: 0.23542628068311333, ppl: 1.3028873108232604, epoch: 1.1385 [2023-12-26 17:50:11,384] [ INFO] - loss: 0.25587648, learning_rate: 0.0002029, global_step: 132, interval_runtime: 3.144, interval_samples_per_second: 5.089023319435538, interval_steps_per_second: 0.3180639574647211, ppl: 1.2915931804966019, epoch: 1.1472 [2023-12-26 17:50:14,954] [ INFO] - loss: 0.26866463, learning_rate: 0.0002019, global_step: 133, interval_runtime: 3.5699, interval_samples_per_second: 4.481976585800716, interval_steps_per_second: 0.28012353661254474, ppl: 1.3082163309577965, epoch: 1.1558 [2023-12-26 17:50:17,765] [ INFO] - loss: 0.24390888, learning_rate: 0.000201, global_step: 134, interval_runtime: 2.811, interval_samples_per_second: 5.691881395841557, interval_steps_per_second: 0.35574258724009733, ppl: 1.2762280352925501, epoch: 1.1645 [2023-12-26 17:50:21,071] [ INFO] - loss: 0.25758892, learning_rate: 0.0002, global_step: 135, interval_runtime: 3.3064, interval_samples_per_second: 4.839085183257765, interval_steps_per_second: 0.30244282395361033, ppl: 1.2938068511707594, epoch: 1.1732 [2023-12-26 17:50:24,536] [ INFO] - loss: 0.26233327, learning_rate: 0.000199, global_step: 136, interval_runtime: 3.4641, interval_samples_per_second: 4.618844516366238, interval_steps_per_second: 0.28867778227288987, ppl: 1.2999597078166822, epoch: 1.1818 [2023-12-26 17:50:27,393] [ INFO] - loss: 0.23059112, learning_rate: 0.0001981, global_step: 137, interval_runtime: 2.8578, interval_samples_per_second: 5.598641286619746, interval_steps_per_second: 0.34991508041373415, ppl: 1.2593442135024853, epoch: 1.1905 [2023-12-26 17:50:30,817] [ INFO] - loss: 0.25141767, learning_rate: 0.0001971, global_step: 138, interval_runtime: 3.4232, interval_samples_per_second: 4.674031065873333, interval_steps_per_second: 0.2921269416170833, ppl: 1.2858470319197617, epoch: 1.1991 [2023-12-26 17:50:34,215] [ INFO] - loss: 0.24990171, learning_rate: 0.0001962, global_step: 139, interval_runtime: 3.3985, interval_samples_per_second: 4.707977607017454, interval_steps_per_second: 0.29424860043859086, ppl: 1.2838992160317682, epoch: 1.2078 [2023-12-26 17:50:37,795] [ INFO] - loss: 0.27762073, learning_rate: 0.0001952, global_step: 140, interval_runtime: 3.58, interval_samples_per_second: 4.469330688584939, interval_steps_per_second: 0.27933316803655867, ppl: 1.3199854713702266, epoch: 1.2165 [2023-12-26 17:50:40,972] [ INFO] - loss: 0.25949112, learning_rate: 0.0001943, global_step: 141, interval_runtime: 3.1773, interval_samples_per_second: 5.035766780695585, interval_steps_per_second: 0.3147354237934741, ppl: 1.296270272780553, epoch: 1.2251 [2023-12-26 17:50:44,141] [ INFO] - loss: 0.27442539, learning_rate: 0.0001933, global_step: 142, interval_runtime: 3.1688, interval_samples_per_second: 5.049256401457214, interval_steps_per_second: 0.31557852509107587, ppl: 1.315774400478758, epoch: 1.2338 [2023-12-26 17:50:47,584] [ INFO] - loss: 0.24077226, learning_rate: 0.0001924, global_step: 143, interval_runtime: 3.4431, interval_samples_per_second: 4.647037890792138, interval_steps_per_second: 0.2904398681745086, ppl: 1.2722312643651177, epoch: 1.2424 [2023-12-26 17:50:51,579] [ INFO] - loss: 0.22204766, learning_rate: 0.0001914, global_step: 144, interval_runtime: 3.9946, interval_samples_per_second: 4.005430431401292, interval_steps_per_second: 0.25033940196258075, ppl: 1.2486308861942246, epoch: 1.2511 [2023-12-26 17:50:54,459] [ INFO] - loss: 0.19586501, learning_rate: 0.0001905, global_step: 145, interval_runtime: 2.8804, interval_samples_per_second: 5.554750144996249, interval_steps_per_second: 0.34717188406226557, ppl: 1.2163626974508257, epoch: 1.2597 [2023-12-26 17:50:58,505] [ INFO] - loss: 0.25291246, learning_rate: 0.0001895, global_step: 146, interval_runtime: 4.0459, interval_samples_per_second: 3.9546168944965663, interval_steps_per_second: 0.2471635559060354, ppl: 1.2877705404671191, epoch: 1.2684 [2023-12-26 17:51:01,899] [ INFO] - loss: 0.25938711, learning_rate: 0.0001886, global_step: 147, interval_runtime: 3.3934, interval_samples_per_second: 4.715020224835305, interval_steps_per_second: 0.29468876405220656, ppl: 1.2961354547208157, epoch: 1.2771 [2023-12-26 17:51:05,917] [ INFO] - loss: 0.24724537, learning_rate: 0.0001876, global_step: 148, interval_runtime: 3.7245, interval_samples_per_second: 4.295922984210759, interval_steps_per_second: 0.26849518651317245, ppl: 1.2804932688678359, epoch: 1.2857 [2023-12-26 17:51:09,845] [ INFO] - loss: 0.23437944, learning_rate: 0.0001867, global_step: 149, interval_runtime: 4.2219, interval_samples_per_second: 3.7897346655173596, interval_steps_per_second: 0.23685841659483498, ppl: 1.2641240604518345, epoch: 1.2944 [2023-12-26 17:51:12,408] [ INFO] - loss: 0.26815447, learning_rate: 0.0001857, global_step: 150, interval_runtime: 2.563, interval_samples_per_second: 6.242649019140541, interval_steps_per_second: 0.3901655636962838, ppl: 1.3075491015257499, epoch: 1.303 [2023-12-26 17:51:16,165] [ INFO] - loss: 0.27968669, learning_rate: 0.0001848, global_step: 151, interval_runtime: 3.756, interval_samples_per_second: 4.259878168341987, interval_steps_per_second: 0.2662423855213742, ppl: 1.3227153274704508, epoch: 1.3117 [2023-12-26 17:51:19,702] [ INFO] - loss: 0.25936183, learning_rate: 0.0001838, global_step: 152, interval_runtime: 3.5382, interval_samples_per_second: 4.522101530478644, interval_steps_per_second: 0.28263134565491527, ppl: 1.296102688830683, epoch: 1.3203 [2023-12-26 17:51:23,326] [ INFO] - loss: 0.28054172, learning_rate: 0.0001829, global_step: 153, interval_runtime: 3.6235, interval_samples_per_second: 4.415651502541165, interval_steps_per_second: 0.2759782189088228, ppl: 1.323846772397645, epoch: 1.329 [2023-12-26 17:51:26,726] [ INFO] - loss: 0.27994087, learning_rate: 0.0001819, global_step: 154, interval_runtime: 3.3999, interval_samples_per_second: 4.705982870381181, interval_steps_per_second: 0.2941239293988238, ppl: 1.3230515779846548, epoch: 1.3377 [2023-12-26 17:51:30,480] [ INFO] - loss: 0.24816436, learning_rate: 0.000181, global_step: 155, interval_runtime: 3.754, interval_samples_per_second: 4.262152946368684, interval_steps_per_second: 0.26638455914804277, ppl: 1.2816705702582385, epoch: 1.3463 [2023-12-26 17:51:33,859] [ INFO] - loss: 0.27234039, learning_rate: 0.00018, global_step: 156, interval_runtime: 3.3798, interval_samples_per_second: 4.734005144076394, interval_steps_per_second: 0.29587532150477464, ppl: 1.3130338688507908, epoch: 1.355 [2023-12-26 17:51:36,960] [ INFO] - loss: 0.27089909, learning_rate: 0.000179, global_step: 157, interval_runtime: 3.1011, interval_samples_per_second: 5.159453649892815, interval_steps_per_second: 0.32246585311830095, ppl: 1.3111427562932552, epoch: 1.3636 [2023-12-26 17:51:40,082] [ INFO] - loss: 0.21484688, learning_rate: 0.0001781, global_step: 158, interval_runtime: 3.121, interval_samples_per_second: 5.126642982820146, interval_steps_per_second: 0.3204151864262591, ppl: 1.239672063846393, epoch: 1.3723 [2023-12-26 17:51:44,127] [ INFO] - loss: 0.25808209, learning_rate: 0.0001771, global_step: 159, interval_runtime: 4.046, interval_samples_per_second: 3.9545353324650216, interval_steps_per_second: 0.24715845827906385, ppl: 1.2944450752591026, epoch: 1.381 [2023-12-26 17:51:47,282] [ INFO] - loss: 0.25623158, learning_rate: 0.0001762, global_step: 160, interval_runtime: 3.1549, interval_samples_per_second: 5.071501292570801, interval_steps_per_second: 0.31696883078567506, ppl: 1.2920519066770093, epoch: 1.3896 [2023-12-26 17:51:50,774] [ INFO] - loss: 0.25370789, learning_rate: 0.0001752, global_step: 161, interval_runtime: 3.4915, interval_samples_per_second: 4.582599389153949, interval_steps_per_second: 0.2864124618221218, ppl: 1.2887952792880928, epoch: 1.3983 [2023-12-26 17:51:54,296] [ INFO] - loss: 0.2352947, learning_rate: 0.0001743, global_step: 162, interval_runtime: 3.5221, interval_samples_per_second: 4.542725048925997, interval_steps_per_second: 0.28392031555787484, ppl: 1.2652815922798888, epoch: 1.4069 [2023-12-26 17:51:58,034] [ INFO] - loss: 0.25839075, learning_rate: 0.0001733, global_step: 163, interval_runtime: 3.7382, interval_samples_per_second: 4.280177741040718, interval_steps_per_second: 0.26751110881504486, ppl: 1.2948446803439122, epoch: 1.4156 [2023-12-26 17:52:01,288] [ INFO] - loss: 0.2666446, learning_rate: 0.0001724, global_step: 164, interval_runtime: 3.2534, interval_samples_per_second: 4.917902685289206, interval_steps_per_second: 0.30736891783057535, ppl: 1.3055763620286938, epoch: 1.4242 [2023-12-26 17:52:03,999] [ INFO] - loss: 0.25938639, learning_rate: 0.0001714, global_step: 165, interval_runtime: 2.7114, interval_samples_per_second: 5.901048876214106, interval_steps_per_second: 0.3688155547633816, ppl: 1.2961345215036244, epoch: 1.4329 [2023-12-26 17:52:07,377] [ INFO] - loss: 0.24273644, learning_rate: 0.0001705, global_step: 166, interval_runtime: 3.3783, interval_samples_per_second: 4.736141699063666, interval_steps_per_second: 0.29600885619147915, ppl: 1.2747326113135993, epoch: 1.4416 [2023-12-26 17:52:10,345] [ INFO] - loss: 0.28284302, learning_rate: 0.0001695, global_step: 167, interval_runtime: 2.9683, interval_samples_per_second: 5.3902156492231486, interval_steps_per_second: 0.3368884780764468, ppl: 1.3268968491997402, epoch: 1.4502 [2023-12-26 17:52:14,356] [ INFO] - loss: 0.2741535, learning_rate: 0.0001686, global_step: 168, interval_runtime: 4.0106, interval_samples_per_second: 3.9894091906581686, interval_steps_per_second: 0.24933807441613554, ppl: 1.315416703206371, epoch: 1.4589 [2023-12-26 17:52:17,175] [ INFO] - loss: 0.26654774, learning_rate: 0.0001676, global_step: 169, interval_runtime: 2.8192, interval_samples_per_second: 5.675461935112414, interval_steps_per_second: 0.35471637094452585, ppl: 1.305449910026437, epoch: 1.4675 [2023-12-26 17:52:20,552] [ INFO] - loss: 0.26166382, learning_rate: 0.0001667, global_step: 170, interval_runtime: 3.3769, interval_samples_per_second: 4.738139341921953, interval_steps_per_second: 0.2961337088701221, ppl: 1.29908974102241, epoch: 1.4762 [2023-12-26 17:52:23,081] [ INFO] - loss: 0.25510639, learning_rate: 0.0001657, global_step: 171, interval_runtime: 2.5285, interval_samples_per_second: 6.327875295277558, interval_steps_per_second: 0.39549220595484735, ppl: 1.2905989203882529, epoch: 1.4848 [2023-12-26 17:52:25,945] [ INFO] - loss: 0.24863556, learning_rate: 0.0001648, global_step: 172, interval_runtime: 2.8643, interval_samples_per_second: 5.585921939935049, interval_steps_per_second: 0.34912012124594055, ppl: 1.2822746357375945, epoch: 1.4935 [2023-12-26 17:52:28,796] [ INFO] - loss: 0.23852955, learning_rate: 0.0001638, global_step: 173, interval_runtime: 2.8505, interval_samples_per_second: 5.6130172682452075, interval_steps_per_second: 0.35081357926532547, ppl: 1.269381215697123, epoch: 1.5022 [2023-12-26 17:52:31,756] [ INFO] - loss: 0.23728807, learning_rate: 0.0001629, global_step: 174, interval_runtime: 2.9605, interval_samples_per_second: 5.404557332604875, interval_steps_per_second: 0.3377848332878047, ppl: 1.267806282132004, epoch: 1.5108 [2023-12-26 17:52:35,203] [ INFO] - loss: 0.22020681, learning_rate: 0.0001619, global_step: 175, interval_runtime: 3.4468, interval_samples_per_second: 4.6420115031974465, interval_steps_per_second: 0.2901257189498404, ppl: 1.2463344583654559, epoch: 1.5195 [2023-12-26 17:52:38,768] [ INFO] - loss: 0.25036472, learning_rate: 0.000161, global_step: 176, interval_runtime: 3.5652, interval_samples_per_second: 4.487882178626485, interval_steps_per_second: 0.2804926361641553, ppl: 1.2844938118490652, epoch: 1.5281 [2023-12-26 17:52:43,140] [ INFO] - loss: 0.23791191, learning_rate: 0.00016, global_step: 177, interval_runtime: 4.3724, interval_samples_per_second: 3.6593169006811537, interval_steps_per_second: 0.2287073062925721, ppl: 1.2685974371544657, epoch: 1.5368 [2023-12-26 17:52:46,225] [ INFO] - loss: 0.20882736, learning_rate: 0.000159, global_step: 178, interval_runtime: 3.0847, interval_samples_per_second: 5.186969734280591, interval_steps_per_second: 0.32418560839253696, ppl: 1.232232247590898, epoch: 1.5455 [2023-12-26 17:52:50,148] [ INFO] - loss: 0.27170029, learning_rate: 0.0001581, global_step: 179, interval_runtime: 3.9226, interval_samples_per_second: 4.078900434792255, interval_steps_per_second: 0.2549312771745159, ppl: 1.3121936648073314, epoch: 1.5541 [2023-12-26 17:52:54,298] [ INFO] - loss: 0.28896001, learning_rate: 0.0001571, global_step: 180, interval_runtime: 4.1502, interval_samples_per_second: 3.8552488398152898, interval_steps_per_second: 0.2409530524884556, ppl: 1.3350383392778098, epoch: 1.5628 [2023-12-26 17:52:57,714] [ INFO] - loss: 0.25084904, learning_rate: 0.0001562, global_step: 181, interval_runtime: 3.4157, interval_samples_per_second: 4.684270169488852, interval_steps_per_second: 0.29276688559305325, ppl: 1.2851160685655432, epoch: 1.5714 [2023-12-26 17:53:01,407] [ INFO] - loss: 0.22257608, learning_rate: 0.0001552, global_step: 182, interval_runtime: 3.6934, interval_samples_per_second: 4.332054394895487, interval_steps_per_second: 0.27075339968096795, ppl: 1.2492908620839802, epoch: 1.5801 [2023-12-26 17:53:04,779] [ INFO] - loss: 0.26296413, learning_rate: 0.0001543, global_step: 183, interval_runtime: 3.3723, interval_samples_per_second: 4.744499599528044, interval_steps_per_second: 0.29653122497050277, ppl: 1.3007800591341643, epoch: 1.5887 [2023-12-26 17:53:07,778] [ INFO] - loss: 0.2423287, learning_rate: 0.0001533, global_step: 184, interval_runtime: 2.9986, interval_samples_per_second: 5.335845010280282, interval_steps_per_second: 0.33349031314251765, ppl: 1.2742129577876262, epoch: 1.5974 [2023-12-26 17:53:10,936] [ INFO] - loss: 0.27385578, learning_rate: 0.0001524, global_step: 185, interval_runtime: 3.1584, interval_samples_per_second: 5.065927928991564, interval_steps_per_second: 0.31662049556197275, ppl: 1.315025135637133, epoch: 1.6061 [2023-12-26 17:53:14,781] [ INFO] - loss: 0.27811337, learning_rate: 0.0001514, global_step: 186, interval_runtime: 3.8442, interval_samples_per_second: 4.1621134343028325, interval_steps_per_second: 0.26013208964392703, ppl: 1.3206359092155378, epoch: 1.6147 [2023-12-26 17:53:18,501] [ INFO] - loss: 0.24453022, learning_rate: 0.0001505, global_step: 187, interval_runtime: 3.7206, interval_samples_per_second: 4.300356433877232, interval_steps_per_second: 0.268772277117327, ppl: 1.277021253223494, epoch: 1.6234 [2023-12-26 17:53:22,207] [ INFO] - loss: 0.28027064, learning_rate: 0.0001495, global_step: 188, interval_runtime: 3.7061, interval_samples_per_second: 4.3172589965647905, interval_steps_per_second: 0.2698286872852994, ppl: 1.323487952651209, epoch: 1.632 [2023-12-26 17:53:25,552] [ INFO] - loss: 0.25416121, learning_rate: 0.0001486, global_step: 189, interval_runtime: 3.345, interval_samples_per_second: 4.783321627266378, interval_steps_per_second: 0.29895760170414865, ppl: 1.289379648407197, epoch: 1.6407 [2023-12-26 17:53:28,571] [ INFO] - loss: 0.29258764, learning_rate: 0.0001476, global_step: 190, interval_runtime: 3.0188, interval_samples_per_second: 5.3001757277696475, interval_steps_per_second: 0.33126098298560297, ppl: 1.3398901593919177, epoch: 1.6494 [2023-12-26 17:53:31,835] [ INFO] - loss: 0.26587084, learning_rate: 0.0001467, global_step: 191, interval_runtime: 3.2637, interval_samples_per_second: 4.902365879014341, interval_steps_per_second: 0.3063978674383963, ppl: 1.3045665499892738, epoch: 1.658 [2023-12-26 17:53:35,816] [ INFO] - loss: 0.22777697, learning_rate: 0.0001457, global_step: 192, interval_runtime: 3.9808, interval_samples_per_second: 4.019260435176305, interval_steps_per_second: 0.25120377719851905, ppl: 1.2558052119602279, epoch: 1.6667 [2023-12-26 17:53:40,226] [ INFO] - loss: 0.25831285, learning_rate: 0.0001448, global_step: 193, interval_runtime: 4.4106, interval_samples_per_second: 3.627641659480362, interval_steps_per_second: 0.22672760371752262, ppl: 1.2947438158720355, epoch: 1.6753 [2023-12-26 17:53:43,513] [ INFO] - loss: 0.27399379, learning_rate: 0.0001438, global_step: 194, interval_runtime: 3.2865, interval_samples_per_second: 4.868390826319412, interval_steps_per_second: 0.30427442664496324, ppl: 1.3152066347801625, epoch: 1.684 [2023-12-26 17:53:47,115] [ INFO] - loss: 0.25389808, learning_rate: 0.0001429, global_step: 195, interval_runtime: 3.6019, interval_samples_per_second: 4.44214133768107, interval_steps_per_second: 0.2776338336050669, ppl: 1.2890404185730422, epoch: 1.6926 [2023-12-26 17:53:50,539] [ INFO] - loss: 0.23455918, learning_rate: 0.0001419, global_step: 196, interval_runtime: 3.4245, interval_samples_per_second: 4.672247791644994, interval_steps_per_second: 0.2920154869778121, ppl: 1.264351294531375, epoch: 1.7013 [2023-12-26 17:53:54,229] [ INFO] - loss: 0.24967569, learning_rate: 0.000141, global_step: 197, interval_runtime: 3.6895, interval_samples_per_second: 4.336603126697901, interval_steps_per_second: 0.27103769541861883, ppl: 1.2836090619225116, epoch: 1.71 [2023-12-26 17:53:58,033] [ INFO] - loss: 0.27436778, learning_rate: 0.00014, global_step: 198, interval_runtime: 3.8043, interval_samples_per_second: 4.2057209552481405, interval_steps_per_second: 0.2628575597030088, ppl: 1.3156986008989742, epoch: 1.7186 [2023-12-26 17:54:01,206] [ INFO] - loss: 0.27817079, learning_rate: 0.000139, global_step: 199, interval_runtime: 3.1734, interval_samples_per_second: 5.041898881432306, interval_steps_per_second: 0.3151186800895191, ppl: 1.3207117423065922, epoch: 1.7273 [2023-12-26 17:54:04,930] [ INFO] - loss: 0.22592455, learning_rate: 0.0001381, global_step: 200, interval_runtime: 3.7237, interval_samples_per_second: 4.296812519168232, interval_steps_per_second: 0.2685507824480145, ppl: 1.2534810865622685, epoch: 1.7359 [2023-12-26 17:54:08,118] [ INFO] - loss: 0.2252913, learning_rate: 0.0001371, global_step: 201, interval_runtime: 3.1881, interval_samples_per_second: 5.018602571847399, interval_steps_per_second: 0.31366266074046245, ppl: 1.2526875709376046, epoch: 1.7446 [2023-12-26 17:54:11,897] [ INFO] - loss: 0.24922991, learning_rate: 0.0001362, global_step: 202, interval_runtime: 3.779, interval_samples_per_second: 4.233925337932708, interval_steps_per_second: 0.26462033362079423, ppl: 1.283036982195212, epoch: 1.7532 [2023-12-26 17:54:15,445] [ INFO] - loss: 0.24007367, learning_rate: 0.0001352, global_step: 203, interval_runtime: 3.5477, interval_samples_per_second: 4.509910041530718, interval_steps_per_second: 0.2818693775956699, ppl: 1.271342806696099, epoch: 1.7619 [2023-12-26 17:54:19,061] [ INFO] - loss: 0.21843848, learning_rate: 0.0001343, global_step: 204, interval_runtime: 3.6156, interval_samples_per_second: 4.425265522074444, interval_steps_per_second: 0.27657909512965273, ppl: 1.2441324752429004, epoch: 1.7706 [2023-12-26 17:54:22,657] [ INFO] - loss: 0.25901291, learning_rate: 0.0001333, global_step: 205, interval_runtime: 3.5968, interval_samples_per_second: 4.4483577751895425, interval_steps_per_second: 0.2780223609493464, ppl: 1.2956505315684395, epoch: 1.7792 [2023-12-26 17:54:26,160] [ INFO] - loss: 0.25296086, learning_rate: 0.0001324, global_step: 206, interval_runtime: 3.5027, interval_samples_per_second: 4.567842146425268, interval_steps_per_second: 0.28549013415157926, ppl: 1.287832870069642, epoch: 1.7879 [2023-12-26 17:54:31,364] [ INFO] - loss: 0.25822967, learning_rate: 0.0001314, global_step: 207, interval_runtime: 4.8634, interval_samples_per_second: 3.2898795200332103, interval_steps_per_second: 0.20561747000207564, ppl: 1.2946361235604167, epoch: 1.7965 [2023-12-26 17:54:34,513] [ INFO] - loss: 0.25502729, learning_rate: 0.0001305, global_step: 208, interval_runtime: 3.4896, interval_samples_per_second: 4.58503463361906, interval_steps_per_second: 0.28656466460119123, ppl: 1.2904968380510597, epoch: 1.8052 [2023-12-26 17:54:37,955] [ INFO] - loss: 0.23156594, learning_rate: 0.0001295, global_step: 209, interval_runtime: 3.4414, interval_samples_per_second: 4.649269288212603, interval_steps_per_second: 0.2905793305132877, ppl: 1.2605724459842225, epoch: 1.8139 [2023-12-26 17:54:41,491] [ INFO] - loss: 0.23533037, learning_rate: 0.0001286, global_step: 210, interval_runtime: 3.5364, interval_samples_per_second: 4.5243706983283145, interval_steps_per_second: 0.28277316864551966, ppl: 1.2653267256792347, epoch: 1.8225 [2023-12-26 17:54:45,119] [ INFO] - loss: 0.24839923, learning_rate: 0.0001276, global_step: 211, interval_runtime: 3.6283, interval_samples_per_second: 4.409803953264618, interval_steps_per_second: 0.2756127470790386, ppl: 1.2819716315788272, epoch: 1.8312 [2023-12-26 17:54:48,561] [ INFO] - loss: 0.2554785, learning_rate: 0.0001267, global_step: 212, interval_runtime: 3.4418, interval_samples_per_second: 4.64878876601975, interval_steps_per_second: 0.2905492978762344, ppl: 1.291079254515542, epoch: 1.8398 [2023-12-26 17:54:51,773] [ INFO] - loss: 0.26169631, learning_rate: 0.0001257, global_step: 213, interval_runtime: 3.2122, interval_samples_per_second: 4.980961213382988, interval_steps_per_second: 0.3113100758364368, ppl: 1.299131949133763, epoch: 1.8485 [2023-12-26 17:54:54,852] [ INFO] - loss: 0.21404707, learning_rate: 0.0001248, global_step: 214, interval_runtime: 3.0789, interval_samples_per_second: 5.196694781124536, interval_steps_per_second: 0.3247934238202835, ppl: 1.2386809581339717, epoch: 1.8571 [2023-12-26 17:54:57,791] [ INFO] - loss: 0.27386618, learning_rate: 0.0001238, global_step: 215, interval_runtime: 2.9393, interval_samples_per_second: 5.4433959968502235, interval_steps_per_second: 0.34021224980313897, ppl: 1.3150388119696605, epoch: 1.8658 [2023-12-26 17:55:00,520] [ INFO] - loss: 0.2628904, learning_rate: 0.0001229, global_step: 216, interval_runtime: 2.7284, interval_samples_per_second: 5.864209488383408, interval_steps_per_second: 0.366513093023963, ppl: 1.300684156155911, epoch: 1.8745 [2023-12-26 17:55:04,433] [ INFO] - loss: 0.22695646, learning_rate: 0.0001219, global_step: 217, interval_runtime: 3.913, interval_samples_per_second: 4.088939940065642, interval_steps_per_second: 0.25555874625410263, ppl: 1.2547752338372222, epoch: 1.8831 [2023-12-26 17:55:07,549] [ INFO] - loss: 0.26187435, learning_rate: 0.000121, global_step: 218, interval_runtime: 3.1156, interval_samples_per_second: 5.135382011242782, interval_steps_per_second: 0.32096137570267386, ppl: 1.2993632671773079, epoch: 1.8918 [2023-12-26 17:55:10,801] [ INFO] - loss: 0.27500743, learning_rate: 0.00012, global_step: 219, interval_runtime: 3.2521, interval_samples_per_second: 4.919908025267539, interval_steps_per_second: 0.30749425157922117, ppl: 1.3165404567268755, epoch: 1.9004 [2023-12-26 17:55:15,393] [ INFO] - loss: 0.24148373, learning_rate: 0.000119, global_step: 220, interval_runtime: 4.5918, interval_samples_per_second: 3.4844913366529418, interval_steps_per_second: 0.21778070854080886, ppl: 1.2731367408142449, epoch: 1.9091 [2023-12-26 17:55:18,813] [ INFO] - loss: 0.26187742, learning_rate: 0.0001181, global_step: 221, interval_runtime: 3.4204, interval_samples_per_second: 4.677857654097276, interval_steps_per_second: 0.29236610338107977, ppl: 1.2993672562286616, epoch: 1.9177 [2023-12-26 17:55:22,702] [ INFO] - loss: 0.24043539, learning_rate: 0.0001171, global_step: 222, interval_runtime: 3.8894, interval_samples_per_second: 4.11376320138214, interval_steps_per_second: 0.25711020008638374, ppl: 1.2718027599982764, epoch: 1.9264 [2023-12-26 17:55:25,935] [ INFO] - loss: 0.24163382, learning_rate: 0.0001162, global_step: 223, interval_runtime: 3.2333, interval_samples_per_second: 4.948552835341657, interval_steps_per_second: 0.30928455220885354, ppl: 1.273327840248372, epoch: 1.9351 [2023-12-26 17:55:29,523] [ INFO] - loss: 0.25678757, learning_rate: 0.0001152, global_step: 224, interval_runtime: 3.5878, interval_samples_per_second: 4.459531574463244, interval_steps_per_second: 0.27872072340395276, ppl: 1.292770474356314, epoch: 1.9437 [2023-12-26 17:55:33,280] [ INFO] - loss: 0.2650784, learning_rate: 0.0001143, global_step: 225, interval_runtime: 3.7566, interval_samples_per_second: 4.259140901950391, interval_steps_per_second: 0.26619630637189945, ppl: 1.303533168772783, epoch: 1.9524 [2023-12-26 17:55:36,056] [ INFO] - loss: 0.27205297, learning_rate: 0.0001133, global_step: 226, interval_runtime: 2.7762, interval_samples_per_second: 5.76335509034562, interval_steps_per_second: 0.36020969314660123, ppl: 1.3126565308860423, epoch: 1.961 [2023-12-26 17:55:39,553] [ INFO] - loss: 0.26250398, learning_rate: 0.0001124, global_step: 227, interval_runtime: 3.4969, interval_samples_per_second: 4.575467931501679, interval_steps_per_second: 0.28596674571885494, ppl: 1.3001816428811321, epoch: 1.9697 [2023-12-26 17:55:42,754] [ INFO] - loss: 0.22927305, learning_rate: 0.0001114, global_step: 228, interval_runtime: 3.2008, interval_samples_per_second: 4.998771998954195, interval_steps_per_second: 0.3124232499346372, ppl: 1.2576854031292437, epoch: 1.9784 [2023-12-26 17:55:46,400] [ INFO] - loss: 0.23428649, learning_rate: 0.0001105, global_step: 229, interval_runtime: 3.6463, interval_samples_per_second: 4.388005366390592, interval_steps_per_second: 0.274250335399412, ppl: 1.2640065655810742, epoch: 1.987 [2023-12-26 17:55:49,800] [ INFO] - loss: 0.26624548, learning_rate: 0.0001095, global_step: 230, interval_runtime: 3.3999, interval_samples_per_second: 4.7060696633521895, interval_steps_per_second: 0.29412935395951184, ppl: 1.3050553843642994, epoch: 1.9957 [2023-12-26 17:55:51,041] [ INFO] - ***** Running Evaluation ***** [2023-12-26 17:55:51,048] [ INFO] - Num examples = 206 [2023-12-26 17:55:51,049] [ INFO] - Total prediction steps = 26 [2023-12-26 17:55:51,049] [ INFO] - Pre device batch size = 8 [2023-12-26 17:55:51,049] [ INFO] - Total Batch size = 8 [2023-12-26 17:56:05,578] [ INFO] - eval_loss: 0.24841691553592682, eval_accuracy: 0.9998247458815283, eval_runtime: 14.5366, eval_samples_per_second: 14.171123774194562, eval_steps_per_second: 1.788588437519702, eval_ppl: 1.2819943041346622, epoch: 1.9957 [2023-12-26 17:56:05,579] [ INFO] - Saving model checkpoint to ./checkpoints/llama_lora_ckpts/checkpoint-230 [2023-12-26 17:56:05,579] [ INFO] - tokenizer config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-230/tokenizer_config.json [2023-12-26 17:56:05,580] [ INFO] - Special tokens file saved in ./checkpoints/llama_lora_ckpts/checkpoint-230/special_tokens_map.json [2023-12-26 17:56:05,583] [ INFO] - Chat-template config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-230/chat_template.json [2023-12-26 17:56:05,760] [ INFO] - Saving optimizer files. [2023-12-26 17:56:10,125] [ INFO] - loss: 0.3656939, learning_rate: 0.0001086, global_step: 231, interval_runtime: 20.3245, interval_samples_per_second: 0.7872267269065429, interval_steps_per_second: 0.049201670431658934, ppl: 1.441513927701439, epoch: 2.0087 [2023-12-26 17:56:13,735] [ INFO] - loss: 0.25071743, learning_rate: 0.0001076, global_step: 232, interval_runtime: 3.61, interval_samples_per_second: 4.432105284987014, interval_steps_per_second: 0.2770065803116884, ppl: 1.284946945569142, epoch: 2.0173 [2023-12-26 17:56:16,630] [ INFO] - loss: 0.23165847, learning_rate: 0.0001067, global_step: 233, interval_runtime: 2.8955, interval_samples_per_second: 5.525760972168335, interval_steps_per_second: 0.34536006076052095, ppl: 1.260689092149201, epoch: 2.026 [2023-12-26 17:56:19,615] [ INFO] - loss: 0.27415121, learning_rate: 0.0001057, global_step: 234, interval_runtime: 2.9846, interval_samples_per_second: 5.360863817454984, interval_steps_per_second: 0.3350539885909365, ppl: 1.3154136909055696, epoch: 2.0346 [2023-12-26 17:56:22,586] [ INFO] - loss: 0.28157312, learning_rate: 0.0001048, global_step: 235, interval_runtime: 2.9716, interval_samples_per_second: 5.3843227946127294, interval_steps_per_second: 0.3365201746632956, ppl: 1.325212892345648, epoch: 2.0433 [2023-12-26 17:56:25,969] [ INFO] - loss: 0.25894362, learning_rate: 0.0001038, global_step: 236, interval_runtime: 3.3824, interval_samples_per_second: 4.730322905159614, interval_steps_per_second: 0.2956451815724759, ppl: 1.2955607590533118, epoch: 2.0519 [2023-12-26 17:56:29,945] [ INFO] - loss: 0.26484933, learning_rate: 0.0001029, global_step: 237, interval_runtime: 3.9764, interval_samples_per_second: 4.023722798639813, interval_steps_per_second: 0.2514826749149883, ppl: 1.303234602627391, epoch: 2.0606 [2023-12-26 17:56:33,242] [ INFO] - loss: 0.2394658, learning_rate: 0.0001019, global_step: 238, interval_runtime: 3.2964, interval_samples_per_second: 4.853791011648139, interval_steps_per_second: 0.30336193822800867, ppl: 1.2705702303809643, epoch: 2.0693 [2023-12-26 17:56:36,414] [ INFO] - loss: 0.23598538, learning_rate: 0.000101, global_step: 239, interval_runtime: 3.1719, interval_samples_per_second: 5.044337250358957, interval_steps_per_second: 0.3152710781474348, ppl: 1.2661557988337833, epoch: 2.0779 [2023-12-26 17:56:39,721] [ INFO] - loss: 0.25548252, learning_rate: 0.0001, global_step: 240, interval_runtime: 3.3075, interval_samples_per_second: 4.8375500000360425, interval_steps_per_second: 0.30234687500225266, ppl: 1.2910844446645773, epoch: 2.0866 [2023-12-26 17:56:42,374] [ INFO] - loss: 0.25759581, learning_rate: 9.905e-05, global_step: 241, interval_runtime: 2.6527, interval_samples_per_second: 6.031540701410583, interval_steps_per_second: 0.3769712938381614, ppl: 1.293815765530674, epoch: 2.0952 [2023-12-26 17:56:45,520] [ INFO] - loss: 0.21545997, learning_rate: 9.81e-05, global_step: 242, interval_runtime: 3.1461, interval_samples_per_second: 5.085716654476958, interval_steps_per_second: 0.31785729090480985, ppl: 1.2404323274232008, epoch: 2.1039 [2023-12-26 17:56:48,992] [ INFO] - loss: 0.28708792, learning_rate: 9.714e-05, global_step: 243, interval_runtime: 3.4717, interval_samples_per_second: 4.608658078343157, interval_steps_per_second: 0.2880411298964473, ppl: 1.332541365362446, epoch: 2.1126 [2023-12-26 17:56:52,167] [ INFO] - loss: 0.26709995, learning_rate: 9.619e-05, global_step: 244, interval_runtime: 3.1752, interval_samples_per_second: 5.039020141404312, interval_steps_per_second: 0.3149387588377695, ppl: 1.306170991597156, epoch: 2.1212 [2023-12-26 17:56:55,975] [ INFO] - loss: 0.23067287, learning_rate: 9.524e-05, global_step: 245, interval_runtime: 3.8081, interval_samples_per_second: 4.201598763252775, interval_steps_per_second: 0.26259992270329846, ppl: 1.2594471691001918, epoch: 2.1299 [2023-12-26 17:56:59,639] [ INFO] - loss: 0.2489226, learning_rate: 9.429e-05, global_step: 246, interval_runtime: 3.6639, interval_samples_per_second: 4.366963643875407, interval_steps_per_second: 0.27293522774221296, ppl: 1.2826427526786524, epoch: 2.1385 [2023-12-26 17:57:03,154] [ INFO] - loss: 0.2632488, learning_rate: 9.333e-05, global_step: 247, interval_runtime: 3.5152, interval_samples_per_second: 4.551726008993185, interval_steps_per_second: 0.2844828755620741, ppl: 1.3011504049042621, epoch: 2.1472 [2023-12-26 17:57:07,978] [ INFO] - loss: 0.26488072, learning_rate: 9.238e-05, global_step: 248, interval_runtime: 4.8246, interval_samples_per_second: 3.316333676947755, interval_steps_per_second: 0.20727085480923468, ppl: 1.3032755118036337, epoch: 2.1558 [2023-12-26 17:57:11,275] [ INFO] - loss: 0.26563224, learning_rate: 9.143e-05, global_step: 249, interval_runtime: 3.2961, interval_samples_per_second: 4.854288515698945, interval_steps_per_second: 0.3033930322311841, ppl: 1.304255317541954, epoch: 2.1645 [2023-12-26 17:57:14,520] [ INFO] - loss: 0.23992111, learning_rate: 9.048e-05, global_step: 250, interval_runtime: 3.2451, interval_samples_per_second: 4.930568905885227, interval_steps_per_second: 0.3081605566178267, ppl: 1.2711488654317253, epoch: 2.1732 [2023-12-26 17:57:17,530] [ INFO] - loss: 0.25501767, learning_rate: 8.952e-05, global_step: 251, interval_runtime: 3.0108, interval_samples_per_second: 5.314274790747383, interval_steps_per_second: 0.33214217442171146, ppl: 1.2904844235311916, epoch: 2.1818 [2023-12-26 17:57:21,339] [ INFO] - loss: 0.24364254, learning_rate: 8.857e-05, global_step: 252, interval_runtime: 3.8085, interval_samples_per_second: 4.201152140561123, interval_steps_per_second: 0.2625720087850702, ppl: 1.275888169979503, epoch: 2.1905 [2023-12-26 17:57:24,571] [ INFO] - loss: 0.21903639, learning_rate: 8.762e-05, global_step: 253, interval_runtime: 3.2316, interval_samples_per_second: 4.951084361900219, interval_steps_per_second: 0.3094427726187637, ppl: 1.2448765769219226, epoch: 2.1991 [2023-12-26 17:57:28,548] [ INFO] - loss: 0.2594893, learning_rate: 8.667e-05, global_step: 254, interval_runtime: 3.9779, interval_samples_per_second: 4.022264940905389, interval_steps_per_second: 0.2513915588065868, ppl: 1.2962679135708033, epoch: 2.2078 [2023-12-26 17:57:31,999] [ INFO] - loss: 0.24149872, learning_rate: 8.571e-05, global_step: 255, interval_runtime: 3.4502, interval_samples_per_second: 4.637357737953826, interval_steps_per_second: 0.2898348586221141, ppl: 1.2731558252770274, epoch: 2.2165 [2023-12-26 17:57:35,627] [ INFO] - loss: 0.23661155, learning_rate: 8.476e-05, global_step: 256, interval_runtime: 3.6282, interval_samples_per_second: 4.409906825259528, interval_steps_per_second: 0.2756191765787205, ppl: 1.2669488758849545, epoch: 2.2251 [2023-12-26 17:57:39,345] [ INFO] - loss: 0.25656974, learning_rate: 8.381e-05, global_step: 257, interval_runtime: 3.7176, interval_samples_per_second: 4.303832502341623, interval_steps_per_second: 0.26898953139635146, ppl: 1.2924889008325786, epoch: 2.2338 [2023-12-26 17:57:42,004] [ INFO] - loss: 0.24968293, learning_rate: 8.286e-05, global_step: 258, interval_runtime: 2.6592, interval_samples_per_second: 6.016841883666504, interval_steps_per_second: 0.3760526177291565, ppl: 1.2836183552857618, epoch: 2.2424 [2023-12-26 17:57:45,878] [ INFO] - loss: 0.25985175, learning_rate: 8.19e-05, global_step: 259, interval_runtime: 3.874, interval_samples_per_second: 4.1300723405708775, interval_steps_per_second: 0.25812952128567984, ppl: 1.2967378310317246, epoch: 2.2511 [2023-12-26 17:57:49,201] [ INFO] - loss: 0.25258374, learning_rate: 8.095e-05, global_step: 260, interval_runtime: 3.3237, interval_samples_per_second: 4.813872911316932, interval_steps_per_second: 0.30086705695730825, ppl: 1.2873472941036401, epoch: 2.2597 [2023-12-26 17:57:53,102] [ INFO] - loss: 0.27856874, learning_rate: 8e-05, global_step: 261, interval_runtime: 3.901, interval_samples_per_second: 4.10154574847204, interval_steps_per_second: 0.2563466092795025, ppl: 1.3212374241350473, epoch: 2.2684 [2023-12-26 17:57:56,114] [ INFO] - loss: 0.23018984, learning_rate: 7.905e-05, global_step: 262, interval_runtime: 3.0118, interval_samples_per_second: 5.312472981014336, interval_steps_per_second: 0.332029561313396, ppl: 1.258838965236283, epoch: 2.2771 [2023-12-26 17:57:59,266] [ INFO] - loss: 0.25268465, learning_rate: 7.81e-05, global_step: 263, interval_runtime: 3.1513, interval_samples_per_second: 5.077254407965167, interval_steps_per_second: 0.31732840049782296, ppl: 1.2874772068737268, epoch: 2.2857 [2023-12-26 17:58:02,669] [ INFO] - loss: 0.25480685, learning_rate: 7.714e-05, global_step: 264, interval_runtime: 3.4035, interval_samples_per_second: 4.7010985877901765, interval_steps_per_second: 0.29381866173688603, ppl: 1.2902123922808444, epoch: 2.2944 [2023-12-26 17:58:05,789] [ INFO] - loss: 0.26835781, learning_rate: 7.619e-05, global_step: 265, interval_runtime: 3.1199, interval_samples_per_second: 5.128435359032087, interval_steps_per_second: 0.32052720993950545, ppl: 1.3078150055936044, epoch: 2.303 [2023-12-26 17:58:09,360] [ INFO] - loss: 0.2600894, learning_rate: 7.524e-05, global_step: 266, interval_runtime: 3.571, interval_samples_per_second: 4.480539634085072, interval_steps_per_second: 0.280033727130317, ppl: 1.2970460373984403, epoch: 2.3117 [2023-12-26 17:58:12,487] [ INFO] - loss: 0.24706353, learning_rate: 7.429e-05, global_step: 267, interval_runtime: 3.1274, interval_samples_per_second: 5.116001172484428, interval_steps_per_second: 0.31975007328027677, ppl: 1.2802604451408, epoch: 2.3203 [2023-12-26 17:58:15,359] [ INFO] - loss: 0.23921801, learning_rate: 7.333e-05, global_step: 268, interval_runtime: 2.8716, interval_samples_per_second: 5.571803589560932, interval_steps_per_second: 0.34823772434755823, ppl: 1.2702554347867892, epoch: 2.329 [2023-12-26 17:58:18,433] [ INFO] - loss: 0.24235269, learning_rate: 7.238e-05, global_step: 269, interval_runtime: 3.0745, interval_samples_per_second: 5.204087189128849, interval_steps_per_second: 0.32525544932055306, ppl: 1.274243526523154, epoch: 2.3377 [2023-12-26 17:58:21,776] [ INFO] - loss: 0.24522433, learning_rate: 7.143e-05, global_step: 270, interval_runtime: 3.3424, interval_samples_per_second: 4.786993651473001, interval_steps_per_second: 0.29918710321706254, ppl: 1.2779079541439566, epoch: 2.3463 [2023-12-26 17:58:24,804] [ INFO] - loss: 0.22888985, learning_rate: 7.048e-05, global_step: 271, interval_runtime: 3.0278, interval_samples_per_second: 5.284330203548979, interval_steps_per_second: 0.3302706377218112, ppl: 1.2572035504116417, epoch: 2.355 [2023-12-26 17:58:28,146] [ INFO] - loss: 0.26895219, learning_rate: 6.952e-05, global_step: 272, interval_runtime: 3.3413, interval_samples_per_second: 4.7884938344333285, interval_steps_per_second: 0.29928086465208303, ppl: 1.3085925757398087, epoch: 2.3636 [2023-12-26 17:58:31,349] [ INFO] - loss: 0.25630388, learning_rate: 6.857e-05, global_step: 273, interval_runtime: 3.2042, interval_samples_per_second: 4.993461661492928, interval_steps_per_second: 0.312091353843308, ppl: 1.2921453254069084, epoch: 2.3723 [2023-12-26 17:58:34,611] [ INFO] - loss: 0.2489031, learning_rate: 6.762e-05, global_step: 274, interval_runtime: 3.2619, interval_samples_per_second: 4.905145051417184, interval_steps_per_second: 0.306571565713574, ppl: 1.282617741388836, epoch: 2.381 [2023-12-26 17:58:38,113] [ INFO] - loss: 0.25413105, learning_rate: 6.667e-05, global_step: 275, interval_runtime: 3.5017, interval_samples_per_second: 4.569229250370903, interval_steps_per_second: 0.2855768281481814, ppl: 1.2893407613034216, epoch: 2.3896 [2023-12-26 17:58:41,438] [ INFO] - loss: 0.25917414, learning_rate: 6.571e-05, global_step: 276, interval_runtime: 3.3234, interval_samples_per_second: 4.8142790303213605, interval_steps_per_second: 0.30089243939508503, ppl: 1.2958594461448403, epoch: 2.3983 [2023-12-26 17:58:44,153] [ INFO] - loss: 0.23144963, learning_rate: 6.476e-05, global_step: 277, interval_runtime: 2.7165, interval_samples_per_second: 5.889941624433201, interval_steps_per_second: 0.36812135152707504, ppl: 1.2604258373292216, epoch: 2.4069 [2023-12-26 17:58:47,813] [ INFO] - loss: 0.24212955, learning_rate: 6.381e-05, global_step: 278, interval_runtime: 3.6594, interval_samples_per_second: 4.3722818309427405, interval_steps_per_second: 0.2732676144339213, ppl: 1.2739592235435087, epoch: 2.4156 [2023-12-26 17:58:51,909] [ INFO] - loss: 0.26929981, learning_rate: 6.286e-05, global_step: 279, interval_runtime: 4.0964, interval_samples_per_second: 3.9058749112927673, interval_steps_per_second: 0.24411718195579796, ppl: 1.3090475477650936, epoch: 2.4242 [2023-12-26 17:58:55,586] [ INFO] - loss: 0.24498856, learning_rate: 6.19e-05, global_step: 280, interval_runtime: 3.6778, interval_samples_per_second: 4.350392832018189, interval_steps_per_second: 0.2718995520011368, ppl: 1.2776066973006666, epoch: 2.4329 [2023-12-26 17:58:59,000] [ INFO] - loss: 0.27687347, learning_rate: 6.095e-05, global_step: 281, interval_runtime: 3.4139, interval_samples_per_second: 4.686728616429989, interval_steps_per_second: 0.29292053852687433, ppl: 1.3189994674734082, epoch: 2.4416 [2023-12-26 17:59:02,266] [ INFO] - loss: 0.28048354, learning_rate: 6e-05, global_step: 282, interval_runtime: 3.2653, interval_samples_per_second: 4.900004483167459, interval_steps_per_second: 0.3062502801979662, ppl: 1.323769753232936, epoch: 2.4502 [2023-12-26 17:59:05,672] [ INFO] - loss: 0.25545651, learning_rate: 5.905e-05, global_step: 283, interval_runtime: 3.406, interval_samples_per_second: 4.6976205739524115, interval_steps_per_second: 0.2936012858720257, ppl: 1.29105086399489, epoch: 2.4589 [2023-12-26 17:59:08,913] [ INFO] - loss: 0.23756577, learning_rate: 5.81e-05, global_step: 284, interval_runtime: 3.2419, interval_samples_per_second: 4.9353487782727425, interval_steps_per_second: 0.3084592986420464, ppl: 1.2681584008259699, epoch: 2.4675 [2023-12-26 17:59:12,223] [ INFO] - loss: 0.23696953, learning_rate: 5.714e-05, global_step: 285, interval_runtime: 3.3095, interval_samples_per_second: 4.834494743523587, interval_steps_per_second: 0.3021559214702242, ppl: 1.2674024994327784, epoch: 2.4762 [2023-12-26 17:59:15,924] [ INFO] - loss: 0.22214374, learning_rate: 5.619e-05, global_step: 286, interval_runtime: 3.7008, interval_samples_per_second: 4.323410216305595, interval_steps_per_second: 0.27021313851909967, ppl: 1.2487508604132393, epoch: 2.4848 [2023-12-26 17:59:18,926] [ INFO] - loss: 0.26022163, learning_rate: 5.524e-05, global_step: 287, interval_runtime: 3.0011, interval_samples_per_second: 5.331368640551071, interval_steps_per_second: 0.33321054003444195, ppl: 1.297217557135743, epoch: 2.4935 [2023-12-26 17:59:22,381] [ INFO] - loss: 0.25578877, learning_rate: 5.429e-05, global_step: 288, interval_runtime: 3.4561, interval_samples_per_second: 4.62943660989538, interval_steps_per_second: 0.2893397881184612, ppl: 1.291479899826737, epoch: 2.5022 [2023-12-26 17:59:26,236] [ INFO] - loss: 0.25826275, learning_rate: 5.333e-05, global_step: 289, interval_runtime: 3.8551, interval_samples_per_second: 4.150364865363805, interval_steps_per_second: 0.2593978040852378, ppl: 1.2946789508317431, epoch: 2.5108 [2023-12-26 17:59:29,565] [ INFO] - loss: 0.25632015, learning_rate: 5.238e-05, global_step: 290, interval_runtime: 3.3288, interval_samples_per_second: 4.806578254975566, interval_steps_per_second: 0.3004111409359729, ppl: 1.2921663487823773, epoch: 2.5195 [2023-12-26 17:59:32,303] [ INFO] - loss: 0.23544586, learning_rate: 5.143e-05, global_step: 291, interval_runtime: 2.7378, interval_samples_per_second: 5.844061328239773, interval_steps_per_second: 0.3652538330149858, ppl: 1.2654728667015342, epoch: 2.5281 [2023-12-26 17:59:36,206] [ INFO] - loss: 0.2519612, learning_rate: 5.048e-05, global_step: 292, interval_runtime: 3.9034, interval_samples_per_second: 4.098974158007226, interval_steps_per_second: 0.25618588487545163, ppl: 1.286546118327028, epoch: 2.5368 [2023-12-26 17:59:39,389] [ INFO] - loss: 0.25703761, learning_rate: 4.952e-05, global_step: 293, interval_runtime: 3.1824, interval_samples_per_second: 5.027707474921347, interval_steps_per_second: 0.3142317171825842, ppl: 1.2930937591010965, epoch: 2.5455 [2023-12-26 17:59:42,644] [ INFO] - loss: 0.25393027, learning_rate: 4.857e-05, global_step: 294, interval_runtime: 3.255, interval_samples_per_second: 4.915461133872737, interval_steps_per_second: 0.3072163208670461, ppl: 1.2890819134519724, epoch: 2.5541 [2023-12-26 17:59:46,200] [ INFO] - loss: 0.2280623, learning_rate: 4.762e-05, global_step: 295, interval_runtime: 3.556, interval_samples_per_second: 4.499463521579386, interval_steps_per_second: 0.28121647009871165, ppl: 1.2561635819857848, epoch: 2.5628 [2023-12-26 17:59:49,439] [ INFO] - loss: 0.25715587, learning_rate: 4.667e-05, global_step: 296, interval_runtime: 3.2288, interval_samples_per_second: 4.955421057314687, interval_steps_per_second: 0.3097138160821679, ppl: 1.2932466894116388, epoch: 2.5714 [2023-12-26 17:59:53,245] [ INFO] - loss: 0.23208797, learning_rate: 4.571e-05, global_step: 297, interval_runtime: 3.8165, interval_samples_per_second: 4.1922906524141785, interval_steps_per_second: 0.26201816577588616, ppl: 1.2612306744107442, epoch: 2.5801 [2023-12-26 17:59:56,236] [ INFO] - loss: 0.25757715, learning_rate: 4.476e-05, global_step: 298, interval_runtime: 2.9911, interval_samples_per_second: 5.349231498268504, interval_steps_per_second: 0.3343269686417815, ppl: 1.293791623153738, epoch: 2.5887 [2023-12-26 17:59:58,780] [ INFO] - loss: 0.23431894, learning_rate: 4.381e-05, global_step: 299, interval_runtime: 2.5437, interval_samples_per_second: 6.290102448579346, interval_steps_per_second: 0.39313140303620914, ppl: 1.2640475832596356, epoch: 2.5974 [2023-12-26 18:00:02,451] [ INFO] - loss: 0.26020345, learning_rate: 4.286e-05, global_step: 300, interval_runtime: 3.6713, interval_samples_per_second: 4.358165578026837, interval_steps_per_second: 0.2723853486266773, ppl: 1.297193973934926, epoch: 2.6061 [2023-12-26 18:00:06,312] [ INFO] - loss: 0.2662026, learning_rate: 4.19e-05, global_step: 301, interval_runtime: 3.8606, interval_samples_per_second: 4.144439144325265, interval_steps_per_second: 0.25902744652032905, ppl: 1.3049994247891998, epoch: 2.6147 [2023-12-26 18:00:09,488] [ INFO] - loss: 0.22707528, learning_rate: 4.095e-05, global_step: 302, interval_runtime: 3.1759, interval_samples_per_second: 5.037924252989906, interval_steps_per_second: 0.3148702658118691, ppl: 1.2549243350884367, epoch: 2.6234 [2023-12-26 18:00:12,644] [ INFO] - loss: 0.25288558, learning_rate: 4e-05, global_step: 303, interval_runtime: 2.9885, interval_samples_per_second: 5.353831891938681, interval_steps_per_second: 0.33461449324616754, ppl: 1.2877359256602163, epoch: 2.632 [2023-12-26 18:00:15,878] [ INFO] - loss: 0.21039963, learning_rate: 3.905e-05, global_step: 304, interval_runtime: 3.4022, interval_samples_per_second: 4.702833429771899, interval_steps_per_second: 0.2939270893607437, ppl: 1.2341711732447127, epoch: 2.6407 [2023-12-26 18:00:19,129] [ INFO] - loss: 0.29113054, learning_rate: 3.81e-05, global_step: 305, interval_runtime: 3.2507, interval_samples_per_second: 4.921999834096974, interval_steps_per_second: 0.3076249896310609, ppl: 1.3379392271375368, epoch: 2.6494 [2023-12-26 18:00:22,181] [ INFO] - loss: 0.25979307, learning_rate: 3.714e-05, global_step: 306, interval_runtime: 3.052, interval_samples_per_second: 5.242421288137288, interval_steps_per_second: 0.3276513305085805, ppl: 1.296661740688312, epoch: 2.658 [2023-12-26 18:00:24,877] [ INFO] - loss: 0.26546153, learning_rate: 3.619e-05, global_step: 307, interval_runtime: 2.6959, interval_samples_per_second: 5.934965613583775, interval_steps_per_second: 0.37093535084898593, ppl: 1.3040326871198566, epoch: 2.6667 [2023-12-26 18:00:28,383] [ INFO] - loss: 0.24498808, learning_rate: 3.524e-05, global_step: 308, interval_runtime: 3.5058, interval_samples_per_second: 4.563913070503891, interval_steps_per_second: 0.2852445669064932, ppl: 1.277606084049599, epoch: 2.6753 [2023-12-26 18:00:31,397] [ INFO] - loss: 0.23861594, learning_rate: 3.429e-05, global_step: 309, interval_runtime: 3.0124, interval_samples_per_second: 5.311380206602021, interval_steps_per_second: 0.3319612629126263, ppl: 1.2694908822773268, epoch: 2.684 [2023-12-26 18:00:35,293] [ INFO] - loss: 0.26361945, learning_rate: 3.333e-05, global_step: 310, interval_runtime: 3.8973, interval_samples_per_second: 4.105387214429576, interval_steps_per_second: 0.2565867009018485, ppl: 1.3016327656898303, epoch: 2.6926 [2023-12-26 18:00:38,928] [ INFO] - loss: 0.25977874, learning_rate: 3.238e-05, global_step: 311, interval_runtime: 3.6351, interval_samples_per_second: 4.401570949953271, interval_steps_per_second: 0.27509818437207945, ppl: 1.2966431596587014, epoch: 2.7013 [2023-12-26 18:00:42,534] [ INFO] - loss: 0.26889122, learning_rate: 3.143e-05, global_step: 312, interval_runtime: 3.6068, interval_samples_per_second: 4.436102060498853, interval_steps_per_second: 0.2772563787811783, ppl: 1.3085127932826588, epoch: 2.71 [2023-12-26 18:00:45,487] [ INFO] - loss: 0.23030058, learning_rate: 3.048e-05, global_step: 313, interval_runtime: 2.9525, interval_samples_per_second: 5.419194334419451, interval_steps_per_second: 0.3386996459012157, ppl: 1.2589783767823681, epoch: 2.7186 [2023-12-26 18:00:49,292] [ INFO] - loss: 0.22973362, learning_rate: 2.952e-05, global_step: 314, interval_runtime: 3.8053, interval_samples_per_second: 4.204660341211675, interval_steps_per_second: 0.2627912713257297, ppl: 1.2582647887089293, epoch: 2.7273 [2023-12-26 18:00:54,526] [ INFO] - loss: 0.27658898, learning_rate: 2.857e-05, global_step: 315, interval_runtime: 5.2337, interval_samples_per_second: 3.057112854928451, interval_steps_per_second: 0.1910695534330282, ppl: 1.3186242786861664, epoch: 2.7359 [2023-12-26 18:00:57,782] [ INFO] - loss: 0.26146588, learning_rate: 2.762e-05, global_step: 316, interval_runtime: 3.2563, interval_samples_per_second: 4.913592525882645, interval_steps_per_second: 0.30709953286766534, ppl: 1.2988326246467194, epoch: 2.7446 [2023-12-26 18:01:01,895] [ INFO] - loss: 0.22772613, learning_rate: 2.667e-05, global_step: 317, interval_runtime: 4.1131, interval_samples_per_second: 3.8900119484687288, interval_steps_per_second: 0.24312574677929555, ppl: 1.2557413684461678, epoch: 2.7532 [2023-12-26 18:01:05,393] [ INFO] - loss: 0.25907451, learning_rate: 2.571e-05, global_step: 318, interval_runtime: 3.4974, interval_samples_per_second: 4.57482009390981, interval_steps_per_second: 0.2859262558693631, ppl: 1.2957303460994465, epoch: 2.7619 [2023-12-26 18:01:09,002] [ INFO] - loss: 0.25298631, learning_rate: 2.476e-05, global_step: 319, interval_runtime: 3.6097, interval_samples_per_second: 4.43251394835555, interval_steps_per_second: 0.27703212177222186, ppl: 1.287865645833255, epoch: 2.7706 [2023-12-26 18:01:12,718] [ INFO] - loss: 0.25091952, learning_rate: 2.381e-05, global_step: 320, interval_runtime: 3.7159, interval_samples_per_second: 4.3058071828059425, interval_steps_per_second: 0.2691129489253714, ppl: 1.2852066467379928, epoch: 2.7792 [2023-12-26 18:01:16,532] [ INFO] - loss: 0.23852175, learning_rate: 2.286e-05, global_step: 321, interval_runtime: 3.8137, interval_samples_per_second: 4.1953520513843365, interval_steps_per_second: 0.26220950321152103, ppl: 1.269371314562255, epoch: 2.7879 [2023-12-26 18:01:20,111] [ INFO] - loss: 0.2394225, learning_rate: 2.19e-05, global_step: 322, interval_runtime: 3.5789, interval_samples_per_second: 4.470648175683526, interval_steps_per_second: 0.2794155109802204, ppl: 1.2705152158810613, epoch: 2.7965 [2023-12-26 18:01:23,907] [ INFO] - loss: 0.23940259, learning_rate: 2.095e-05, global_step: 323, interval_runtime: 3.7965, interval_samples_per_second: 4.214424717206121, interval_steps_per_second: 0.2634015448253826, ppl: 1.2704899201749327, epoch: 2.8052 [2023-12-26 18:01:27,614] [ INFO] - loss: 0.27364457, learning_rate: 2e-05, global_step: 324, interval_runtime: 3.707, interval_samples_per_second: 4.316156378066712, interval_steps_per_second: 0.2697597736291695, ppl: 1.314747418507585, epoch: 2.8139 [2023-12-26 18:01:31,409] [ INFO] - loss: 0.2073748, learning_rate: 1.905e-05, global_step: 325, interval_runtime: 3.7944, interval_samples_per_second: 4.216775989895652, interval_steps_per_second: 0.26354849936847824, ppl: 1.2304436556503757, epoch: 2.8225 [2023-12-26 18:01:35,076] [ INFO] - loss: 0.27432245, learning_rate: 1.81e-05, global_step: 326, interval_runtime: 3.6666, interval_samples_per_second: 4.363658116977148, interval_steps_per_second: 0.27272863231107175, ppl: 1.3156389616331297, epoch: 2.8312 [2023-12-26 18:01:38,026] [ INFO] - loss: 0.23853159, learning_rate: 1.714e-05, global_step: 327, interval_runtime: 2.9507, interval_samples_per_second: 5.422525735818736, interval_steps_per_second: 0.338907858488671, ppl: 1.2693838052374444, epoch: 2.8398 [2023-12-26 18:01:41,919] [ INFO] - loss: 0.24444909, learning_rate: 1.619e-05, global_step: 328, interval_runtime: 3.8926, interval_samples_per_second: 4.110366966863487, interval_steps_per_second: 0.25689793542896794, ppl: 1.2769176526918324, epoch: 2.8485 [2023-12-26 18:01:45,451] [ INFO] - loss: 0.2582615, learning_rate: 1.524e-05, global_step: 329, interval_runtime: 3.5324, interval_samples_per_second: 4.529516838633957, interval_steps_per_second: 0.2830948024146223, ppl: 1.2946773324840661, epoch: 2.8571 [2023-12-26 18:01:48,673] [ INFO] - loss: 0.24779966, learning_rate: 1.429e-05, global_step: 330, interval_runtime: 3.2218, interval_samples_per_second: 4.966123008457353, interval_steps_per_second: 0.31038268802858454, ppl: 1.2812032302259, epoch: 2.8658 [2023-12-26 18:01:52,729] [ INFO] - loss: 0.24498056, learning_rate: 1.333e-05, global_step: 331, interval_runtime: 4.0562, interval_samples_per_second: 3.9445352860847476, interval_steps_per_second: 0.24653345538029672, ppl: 1.2775964764879715, epoch: 2.8745 [2023-12-26 18:01:55,759] [ INFO] - loss: 0.22714457, learning_rate: 1.238e-05, global_step: 332, interval_runtime: 3.0296, interval_samples_per_second: 5.28113896366891, interval_steps_per_second: 0.3300711852293069, ppl: 1.2550112918081957, epoch: 2.8831 [2023-12-26 18:01:58,791] [ INFO] - loss: 0.27777711, learning_rate: 1.143e-05, global_step: 333, interval_runtime: 3.0253, interval_samples_per_second: 5.288713311737848, interval_steps_per_second: 0.3305445819836155, ppl: 1.320191906839008, epoch: 2.8918 [2023-12-26 18:02:02,712] [ INFO] - loss: 0.26041701, learning_rate: 1.048e-05, global_step: 334, interval_runtime: 3.9276, interval_samples_per_second: 4.073694137666119, interval_steps_per_second: 0.25460588360413244, ppl: 1.297471032263235, epoch: 2.9004 [2023-12-26 18:02:06,276] [ INFO] - loss: 0.24744098, learning_rate: 9.524e-06, global_step: 335, interval_runtime: 3.5639, interval_samples_per_second: 4.48946259600896, interval_steps_per_second: 0.28059141225056, ppl: 1.280743770655688, epoch: 2.9091 [2023-12-26 18:02:09,363] [ INFO] - loss: 0.2456432, learning_rate: 8.571e-06, global_step: 336, interval_runtime: 3.0873, interval_samples_per_second: 5.182528243944213, interval_steps_per_second: 0.3239080152465133, ppl: 1.2784433435701654, epoch: 2.9177 [2023-12-26 18:02:12,762] [ INFO] - loss: 0.28566605, learning_rate: 7.619e-06, global_step: 337, interval_runtime: 3.3991, interval_samples_per_second: 4.70716161416221, interval_steps_per_second: 0.29419760088513813, ppl: 1.330648011142046, epoch: 2.9264 [2023-12-26 18:02:16,726] [ INFO] - loss: 0.25106084, learning_rate: 6.667e-06, global_step: 338, interval_runtime: 3.9641, interval_samples_per_second: 4.036175474060172, interval_steps_per_second: 0.2522609671287607, ppl: 1.2853882849755656, epoch: 2.9351 [2023-12-26 18:02:19,759] [ INFO] - loss: 0.26100892, learning_rate: 5.714e-06, global_step: 339, interval_runtime: 3.0327, interval_samples_per_second: 5.27582381013065, interval_steps_per_second: 0.3297389881331656, ppl: 1.2982392456761134, epoch: 2.9437 [2023-12-26 18:02:22,956] [ INFO] - loss: 0.25734669, learning_rate: 4.762e-06, global_step: 340, interval_runtime: 3.1968, interval_samples_per_second: 5.004943052240602, interval_steps_per_second: 0.3128089407650376, ppl: 1.2934934902914355, epoch: 2.9524 [2023-12-26 18:02:26,641] [ INFO] - loss: 0.26527035, learning_rate: 3.81e-06, global_step: 341, interval_runtime: 3.6853, interval_samples_per_second: 4.341564719683562, interval_steps_per_second: 0.2713477949802226, ppl: 1.3037834059802764, epoch: 2.961 [2023-12-26 18:02:29,608] [ INFO] - loss: 0.23022859, learning_rate: 2.857e-06, global_step: 342, interval_runtime: 2.9664, interval_samples_per_second: 5.3937902446375645, interval_steps_per_second: 0.3371118902898478, ppl: 1.2588877461913108, epoch: 2.9697 [2023-12-26 18:02:33,054] [ INFO] - loss: 0.2588667, learning_rate: 1.905e-06, global_step: 343, interval_runtime: 3.4466, interval_samples_per_second: 4.642287339553546, interval_steps_per_second: 0.2901429587220966, ppl: 1.2954611083523406, epoch: 2.9784 [2023-12-26 18:02:36,004] [ INFO] - loss: 0.26296732, learning_rate: 9.524e-07, global_step: 344, interval_runtime: 2.9501, interval_samples_per_second: 5.423617388552312, interval_steps_per_second: 0.3389760867845195, ppl: 1.3007842086291714, epoch: 2.987 [2023-12-26 18:02:39,677] [ INFO] - loss: 0.23235616, learning_rate: 0.0, global_step: 345, interval_runtime: 3.6724, interval_samples_per_second: 4.356782299180001, interval_steps_per_second: 0.27229889369875004, ppl: 1.2615689692269303, epoch: 2.9957 [2023-12-26 18:02:39,677] [ INFO] - ***** Running Evaluation ***** [2023-12-26 18:02:39,677] [ INFO] - Num examples = 206 [2023-12-26 18:02:39,677] [ INFO] - Total prediction steps = 26 [2023-12-26 18:02:39,678] [ INFO] - Pre device batch size = 8 [2023-12-26 18:02:39,678] [ INFO] - Total Batch size = 8 [2023-12-26 18:02:54,183] [ INFO] - eval_loss: 0.2482166439294815, eval_accuracy: 1.0, eval_runtime: 14.5045, eval_samples_per_second: 14.202442044521213, eval_steps_per_second: 1.792541228920153, eval_ppl: 1.2817375827837763, epoch: 2.9957 [2023-12-26 18:02:54,183] [ INFO] - Saving model checkpoint to ./checkpoints/llama_lora_ckpts/checkpoint-345 [2023-12-26 18:02:54,184] [ INFO] - tokenizer config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-345/tokenizer_config.json [2023-12-26 18:02:54,184] [ INFO] - Special tokens file saved in ./checkpoints/llama_lora_ckpts/checkpoint-345/special_tokens_map.json [2023-12-26 18:02:54,187] [ INFO] - Chat-template config file saved in ./checkpoints/llama_lora_ckpts/checkpoint-345/chat_template.json [2023-12-26 18:02:54,362] [ INFO] - Saving optimizer files. [2023-12-26 18:02:55,144] [ INFO] - Training completed. [2023-12-26 18:02:55,144] [ INFO] - Loading best model from ./checkpoints/llama_lora_ckpts/checkpoint-345 (score: 1.0). [2023-12-26 18:02:55,235] [ INFO] - Load lora weight successfully [2023-12-26 18:02:55,236] [ INFO] - set state-dict :None [2023-12-26 18:02:55,237] [ INFO] - train_runtime: 1233.9516, train_samples_per_second: 4.492882786697039, train_steps_per_second: 0.27958956735398244, train_loss: 0.4301475836315017, epoch: 2.9957 [2023-12-26 18:02:55,238] [ INFO] - Saving model checkpoint to ./checkpoints/llama_lora_ckpts [2023-12-26 18:02:55,238] [ INFO] - tokenizer config file saved in ./checkpoints/llama_lora_ckpts/tokenizer_config.json [2023-12-26 18:02:55,238] [ INFO] - Special tokens file saved in ./checkpoints/llama_lora_ckpts/special_tokens_map.json [2023-12-26 18:02:55,241] [ INFO] - Chat-template config file saved in ./checkpoints/llama_lora_ckpts/chat_template.json [2023-12-26 18:02:55,383] [ INFO] - ***** train metrics ***** [2023-12-26 18:02:55,383] [ INFO] - epoch = 2.9957 [2023-12-26 18:02:55,384] [ INFO] - train_loss = 0.4301 [2023-12-26 18:02:55,384] [ INFO] - train_runtime = 0:20:33.95 [2023-12-26 18:02:55,384] [ INFO] - train_samples_per_second = 4.4929 [2023-12-26 18:02:55,384] [ INFO] - train_steps_per_second = 0.2796 [2023-12-26 18:02:55,400] [ INFO] - ***** Running Evaluation ***** [2023-12-26 18:02:55,400] [ INFO] - Num examples = 206 [2023-12-26 18:02:55,400] [ INFO] - Total prediction steps = 26 [2023-12-26 18:02:55,400] [ INFO] - Pre device batch size = 8 [2023-12-26 18:02:55,401] [ INFO] - Total Batch size = 8 [2023-12-26 18:03:09,938] [ INFO] - eval_loss: 0.2482166439294815, eval_accuracy: 1.0, eval_runtime: 14.5378, eval_samples_per_second: 14.169953149904021, eval_steps_per_second: 1.7884406888228377, eval_ppl: 1.2817375827837763, epoch: 2.9957 [2023-12-26 18:03:09,938] [ INFO] - ***** eval metrics ***** [2023-12-26 18:03:09,938] [ INFO] - epoch = 2.9957 [2023-12-26 18:03:09,939] [ INFO] - eval_accuracy = 1.0 [2023-12-26 18:03:09,939] [ INFO] - eval_loss = 0.2482 [2023-12-26 18:03:09,939] [ INFO] - eval_ppl = 1.2817 [2023-12-26 18:03:09,939] [ INFO] - eval_runtime = 0:00:14.53 [2023-12-26 18:03:09,939] [ INFO] - eval_samples_per_second = 14.17 [2023-12-26 18:03:09,939] [ INFO] - eval_steps_per_second = 1.7884
3. 加载lora模型并测试训练后的效果
In [1]
import json
import paddle
import get_result
from paddlenlp.peft import LoRAModel
from paddlenlp.transformers import AutoModelForCausalLM,AutoTokenizer
#加载基础模型
model = AutoModelForCausalLM.from_pretrained(
'/home/aistudio/Baichuan2-7B-Chat',
dtype="float16",
tensor_parallel_degree=0,
tensor_parallel_rank=0,
)
# 加载lora模型
model = LoRAModel.from_pretrained(model, '/home/aistudio/PaddleNLP/llm/checkpoints/llama_lora_ckpts')
model.eval()
tokenizer = AutoTokenizer.from_pretrained('/home/aistudio/PaddleNLP/llm/checkpoints/llama_lora_ckpts', padding_side="left")
result=get_result.generate(model,tokenizer,"我感冒了,有点咳嗽,发热,头疼,有口渴但是小便不利")
print(result)
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2023-12-27 13:05:31,800] [ INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load '/home/aistudio/Baichuan2-7B-Chat'. [2023-12-27 13:05:31,802] [ INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/config.json [2023-12-27 13:05:31,804] [ INFO] - Loading weights file /home/aistudio/Baichuan2-7B-Chat/model.safetensors.index.json W1227 13:05:31.809015 3307 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8 W1227 13:05:31.810439 3307 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9. Loading checkpoint shards: 100%|██████████| 4/4 [03:49<00:00, 57.38s/it] [2023-12-27 13:09:37,056] [ INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM. [2023-12-27 13:09:37,057] [ INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at /home/aistudio/Baichuan2-7B-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [2023-12-27 13:09:37,061] [ INFO] - Loading configuration file /home/aistudio/Baichuan2-7B-Chat/generation_config.json [2023-12-27 13:09:37,228] [ WARNING] - Reset tensor_parallel_degree of lora_config to 0. [2023-12-27 13:09:37,296] [ INFO] - Loading the LoRA weights from /home/aistudio/PaddleNLP/llm/checkpoints/llama_lora_ckpts/lora_model_state.pdparams [2023-12-27 13:09:37,355] [ INFO] - Load lora weight successfully [2023-12-27 13:09:37,366] [ INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load '/home/aistudio/PaddleNLP/llm/checkpoints/llama_lora_ckpts'.
的症状<br><reserved_203>诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸 如果有发热但小便不热的症状,<reserved_221>诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸 这个<reserved_264>诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸D诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸j诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或保济丸 建议食疗:葱白姜汤或盐蒸橙子或烤橘子或蜂蜜浸白萝卜汁 建议保健品:益生菌类(如酸奶、酵母片)或蒜素补充剂 or 五谷杂粮粥或坚果类食物 建议养生保健方法:按揉天枢穴或艾灸中脘穴或推拿按摩 建议食疗方:生姜汤或砂糖汤或米醋汤或姜蜜蒸柠檬水 建议中成药:九味羌活丸或麻桂感冒丸<reserved_288>诊断:太阳伤寒蓄水。建议处方:五苓散。建议中成药:五苓胶囊或藿香正气水或
看起来训练效果还不错,辨证和用药都是比较准确的。
八、gradio发布
打开根目录下的main.gradio.py,点击应用部署即可发布