BERT ner 微调参数的选择

news2024/12/28 19:45:16

在这里插入图片描述
针对批大小学习率的组合进行收敛速度测试,结论:

  • 相同轮数的条件下,batchsize-32 相比 batchsize-256 的迭代步数越多,收敛更快
  • 批越大的话,学习率可以相对设得大一点

画图代码(deepseek生成):

import matplotlib.pyplot as plt

dic = {
    (256, 1e-5): [0,        0.185357, 0.549124, 0.649283, 0.720528, 0.743900],
    (256, 2e-5): [0.086368, 0.604535, 0.731870, 0.763409, 0.773608, 0.781042],
    (256, 3e-5): [0.415224, 0.715375, 0.753391, 0.771326, 0.784421, 0.783432],
    (32,  1e-5): [0.710058, 0.769245, 0.781832, 0.786909, 0.792920, 0.799076],
    (32,  2e-5): [0.761296, 0.766089, 0.795317, 0.801602, 0.795861, 0.799864],
    (32,  3e-5): [0.771385, 0.788055, 0.791863, 0.793491, 0.800057, 0.799527],
}

# 提取参数和对应的训练轨迹
params = list(dic.keys())
trajectories = list(dic.values())

# 绘制折线图
plt.figure(figsize=(10, 6))
for param, trajectory in zip(params, trajectories):
    plt.plot(range(1, len(trajectory) + 1), trajectory, label=f'{param[0]}, {param[1]}')

# 设置图表标题和坐标轴标签
plt.title('Validation Score Trajectory for Different Parameters')
plt.xlabel('Training Epochs')
plt.ylabel('Performance Metric')

# 添加图例
plt.legend()

# 显示图表
plt.show()

附录

微调命令

!python ner_finetune.py \
--gpu_device 0 \
--train_batch_size 32 \
--valid_batch_size 32 \
--epochs 6 \
--learning_rate 3e-5 \
--train_file data/cluener2020/train.json \
--valid_file data/cluener2020/dev.json \
--allow_label "{'name': 'PER', 'organization': 'ORG', 'address': 'LOC', 'company': 'ORG', 'government': 'ORG'}" \
--pretrained_model models/bert-base-chinese \
--tokenizer models/bert-base-chinese \
--save_model_dir models/local/bert_tune_5

日志

Namespace(allow_label={'name': 'PER', 'organization': 'ORG', 'address': 'LOC', 'company': 'ORG', 'government': 'ORG'}, epochs=6, gpu_device='0', learning_rate=3e-05, max_grad_norm=10, max_len=128, pretrained_model='models/bert-base-chinese', save_model_dir='models/local/bert_tune_5', tokenizer='models/bert-base-chinese', train_batch_size=32, train_file='data/cluener2020/train.json', valid_batch_size=32, valid_file='data/cluener2020/dev.json')
CUDA is available!
Number of CUDA devices: 1
Device name: NVIDIA GeForce RTX 2080 Ti
Device capability: (7, 5)
标签映射: {'O': 0, 'B-PER': 1, 'B-ORG': 2, 'B-LOC': 3, 'I-PER': 4, 'I-ORG': 5, 'I-LOC': 6}
加载数据集:data/cluener2020/train.json
  0%|                                                 | 0/10748 [00:00<?, ?it/s]2024-05-21 14:05:00.121060: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-05-21 14:05:00.172448: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-21 14:05:00.914503: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
100%|███████████████████████████████████| 10748/10748 [00:06<00:00, 1667.09it/s]
100%|█████████████████████████████████████| 1343/1343 [00:00<00:00, 2244.82it/s]
TRAIN Dataset: 7824
VALID Dataset: 971
加载模型:models/bert-base-chinese
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at models/bert-base-chinese were not used when initializing BertForTokenClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at models/bert-base-chinese and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Training epoch: 1
Training loss per 100 training steps: 2.108242988586426
Training loss per 100 training steps: 0.16535191606767108
Training loss per 100 training steps: 0.10506394136678521
Training loss epoch: 0.09411744458638892
Training accuracy epoch: 0.9225966380147197
Validation loss per 100 evaluation steps: 0.05695410072803497
Validation Loss: 0.03870751528489974
Validation Accuracy: 0.9578078217665675
              precision    recall  f1-score  support
LOC            0.544872  0.683646  0.606421    373.0
ORG            0.750225  0.841734  0.793349    992.0
PER            0.806452  0.913978  0.856855    465.0
micro avg      0.718691  0.827869  0.769426   1830.0
macro avg      0.700516  0.813119  0.752208   1830.0
weighted avg   0.722656  0.827869  0.771385   1830.0
Training epoch: 2
Training loss per 100 training steps: 0.030774801969528198
Training loss per 100 training steps: 0.03080757723033133
Training loss per 100 training steps: 0.03123850032538917
Training loss epoch: 0.03104725396450685
Training accuracy epoch: 0.965836879311368
Validation loss per 100 evaluation steps: 0.07264477759599686
Validation Loss: 0.03662088588480988
Validation Accuracy: 0.961701479064846
              precision    recall  f1-score  support
LOC            0.606635  0.686327  0.644025    373.0
ORG            0.776735  0.834677  0.804665    992.0
PER            0.821497  0.920430  0.868154    465.0
micro avg      0.752613  0.826230  0.787705   1830.0
macro avg      0.734956  0.813812  0.772281   1830.0
weighted avg   0.753439  0.826230  0.788055   1830.0
Training epoch: 3
Training loss per 100 training steps: 0.01707942970097065
Training loss per 100 training steps: 0.020070969108676555
Training loss per 100 training steps: 0.0214405001942717
Training loss epoch: 0.021760025719294744
Training accuracy epoch: 0.9760199331084162
Validation loss per 100 evaluation steps: 0.04943108558654785
Validation Loss: 0.03711987908689245
Validation Accuracy: 0.9608263101353024
              precision    recall  f1-score  support
LOC            0.596847  0.710456  0.648715    373.0
ORG            0.776328  0.839718  0.806780    992.0
PER            0.855967  0.894624  0.874869    465.0
micro avg      0.755866  0.827322  0.789982   1830.0
macro avg      0.743047  0.814932  0.776788   1830.0
weighted avg   0.759981  0.827322  0.791863   1830.0
Training epoch: 4
Training loss per 100 training steps: 0.014015918597579002
Training loss per 100 training steps: 0.015494177154827826
Training loss per 100 training steps: 0.015997812416015278
Training loss epoch: 0.016311514128607756
Training accuracy epoch: 0.9820175765149567
Validation loss per 100 evaluation steps: 0.04825771600008011
Validation Loss: 0.04313824124514095
Validation Accuracy: 0.9585233633276977
              precision    recall  f1-score  support
LOC            0.618037  0.624665  0.621333    373.0
ORG            0.794118  0.843750  0.818182    992.0
PER            0.853955  0.905376  0.878914    465.0
micro avg      0.774948  0.814754  0.794353   1830.0
macro avg      0.755370  0.791264  0.772810   1830.0
weighted avg   0.773433  0.814754  0.793491   1830.0
Training epoch: 5
Training loss per 100 training steps: 0.008429908193647861
Training loss per 100 training steps: 0.012711652241057098
Training loss per 100 training steps: 0.012486798004177747
Training loss epoch: 0.012644028145705862
Training accuracy epoch: 0.9862629694070859
Validation loss per 100 evaluation steps: 0.06491336971521378
Validation Loss: 0.049802260893967845
Validation Accuracy: 0.9582402189526026
              precision    recall  f1-score  support
LOC            0.608899  0.697051  0.650000    373.0
ORG            0.795749  0.867944  0.830280    992.0
PER            0.831643  0.881720  0.855950    465.0
micro avg      0.764735  0.836612  0.799061   1830.0
macro avg      0.745430  0.815572  0.778743   1830.0
weighted avg   0.766785  0.836612  0.800057   1830.0
Training epoch: 6
Training loss per 100 training steps: 0.009717799723148346
Training loss per 100 training steps: 0.008476002312422093
Training loss per 100 training steps: 0.008608183584903456
Training loss epoch: 0.008819052852614194
Training accuracy epoch: 0.9903819524689835
Validation loss per 100 evaluation steps: 0.023518526926636696
Validation Loss: 0.049626993015408516
Validation Accuracy: 0.9602429496287505
              precision    recall  f1-score  support
LOC            0.614251  0.670241  0.641026    373.0
ORG            0.806482  0.852823  0.829005    992.0
PER            0.848548  0.879570  0.863780    465.0
micro avg      0.776574  0.822404  0.798832   1830.0
macro avg      0.756427  0.800878  0.777937   1830.0
weighted avg   0.777989  0.822404  0.799527   1830.0

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1692556.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

aws glue配置读取本地kafka数据源

创建连接时填写本地私有ip地址&#xff0c;选择网络配置 配置任务选择kafka作为数据源 但是执行任务时日志显示连接失败 文档提到只能用加密通信 如果您希望与 Kafka 数据源建立安全连接&#xff0c;请选择 Require SSL connection (需要 SSL 连接)&#xff0c;并在 Kafka priv…

Nginx - 集成ModSecurity实现WAF功能

文章目录 Pre概述什么是ModSecurity&#xff1f;ModSecurity的工作原理主要功能和特点使用场景与其他安全解决方案的比较 ModSecurity在Nginx中的应用安装ModSecurity配置ModSecurity Pre Nginx - 集成Waf 功能 概述 ModSecurity是一款开源的Web应用防火墙&#xff08;WAF&a…

Java反射角度理解spring

概述 Java反射&#xff08;Reflection&#xff09;是Java编程语言的一个特性&#xff0c;它允许在运行时对类、接口、字段和方法进行动态查询和操作。反射提供了一种在运行时查看和修改程序行为的能力&#xff0c;这通常用于实现一些高级功能&#xff0c;如框架(Spring)、ORM&…

考研计组chap1计算机系统概述

目录 一、计算机发展历程(不考了) 二、计算机硬件的基本组成 3 1.五个部分 &#xff08;1&#xff09;输入设备 &#xff08;2&#xff09;控制器 &#xff08;3&#xff09;运算器 &#xff08;4&#xff09;&#xff08;主&#xff09;存储器 &#xff08;5&#xff0…

探索Python技巧:零基础学习缩进与逻辑关系

新书上架~&#x1f447;全国包邮奥~ python实用小工具开发教程http://pythontoolsteach.com/3 欢迎关注我&#x1f446;&#xff0c;收藏下次不迷路┗|&#xff40;O′|┛ 嗷~~ 目录 一、理解Python的缩进语法 缩进规则详解 二、缩进在逻辑关系中的应用 逻辑块示例 三、实…

Leecode热题100---55:跳跃游戏(贪心算法)

题目&#xff1a; 给你一个非负整数数组 nums &#xff0c;你最初位于数组的 第一个下标 。数组中的每个元素代表你在该位置可以跳跃的最大长度。 判断你是否能够到达最后一个下标&#xff0c;如果可以&#xff0c;返回 true &#xff1b;否则&#xff0c;返回 false 。 贪心算…

从机械尘埃到智能星河:探索从工业心脏到AI大脑的世纪跨越(一点个人感想)...

全文预计1400字左右&#xff0c;预计阅读需要8分钟。 近期&#xff0c;人工智能领域呈现出前所未有的活跃景象&#xff0c;各类创新成果如雨后春笋般涌现&#xff0c;不仅推动了科技的边界&#xff0c;也为全球经济注入了新的活力。 这不&#xff0c;最近报道16家国内外企业在A…

操作系统实验四:多线程与信号量编程

操作系统实验上机 更多技术请访问&#xff1a;www.xuanworld.top 部分审核不通过的文章将发至个人博客&#xff1a;www.xuanworld.top 欢迎来52破解论坛阅读帖子&#xff1a;https://www.52pojie.cn/thread-1891208-1-1.html 实验名称实验序号实验日期实验人多线程与信号量…

2024年电工杯数学建模竞赛A题完整解析 | 代码 论文分享

A 题 问题一1.1问题分析1.2第一问1.2.1指标定义1.2.2结果计算1.2.3关键因素分析 1.3第二问1.3.1模型建立1.3.2算法求解1.3.3求解结果 1.4第三问1.4.1模型建立1.4.2计算结果 第二题2.1 问题分析2.2第一问2.2.1指标计算 数据与代码代码分享完整资料 A题的问题一和问题二终于完成啦…

React useState基本类型变量的使用

在 React 中&#xff0c;useState 是一个 Hook&#xff0c;用于在函数组件中添加状态&#xff0c;它可以让函数组件拥有状态。基本使用方法如下&#xff1a; // App.jsx import React, { useState } from reactfunction App() {// 使用 useState 创建一个状态变量&#xff0c;初…

vulhub——ActiveMQ漏洞

文章目录 一、CVE-2015-5254(反序列化漏洞)二、CVE-2016-3088&#xff08;任意文件写入漏洞&#xff09;2.1 漏洞原理2.2 写入webshell2.3 写入crontab 三、CVE-2022-41678&#xff08;远程代码执行漏洞&#xff09;方法一方法2 四、CVE-2023-46604&#xff08;反序列化命令执行…

状压dp 例题

终于在洛谷上发布题解了QWQ P10447 最短 Hamilton 路径 题解 分析题目&#xff1a; 一张 n n n 个点的带权无向图&#xff0c;求起点 0 0 0 至终点 n − 1 n-1 n−1 的最短 Hamilton 路径&#xff08;从 0 ∼ n − 1 0\sim n-1 0∼n−1 不重复地经过每个点一次&#xff…

springboot2+mybatis-plus+vue3创建入门小项目[学生管理系统]02[实战篇]

创建一个 vue 项目 创建这个新的文件夹 创建前端项目 eggbox 数据库 SQL CREATE DATABASE IF NOT EXISTS egg DEFAULT CHARACTER SET utf8 COLLATE utf8_bin; USE egg;CREATE TABLE stu (id INT AUTO_INCREMENT, -- 自增主键name VARCHAR(64) NOT NULL, -- 非空姓名字段&a…

DataGear 制作服务端分页的数据可视化看板

DataGear 2.3.0 版本新增了附件图表数据集特性&#xff08;在新建图表时将关联的数据集设置为 附件 &#xff0c;具体参考官网文档定义图表章节&#xff09;&#xff0c;在制作看板时&#xff0c;可以基于此特性&#xff0c;结合dg-chart-listener&#xff0c;利用服务端数据扩…

HTTP 请求的完整过程

HTTP 请求的完整过程 当用户在浏览器输入网址回车之后&#xff0c;网络协议都做了哪些工作呢? 首先工作的是 浏览器应用程序&#xff0c;他要解析出 URL中的域名 根据域名获取对应的ip地址&#xff0c;首先从浏览器缓存中査看&#xff0c;如下可以査看浏览器中域名对应ip的解…

WPF中MVVM架构学习笔记

MVVM架构是一种基于数据驱动的软件开发架构&#xff0c;它将数据模型&#xff08;Model&#xff09;、视图&#xff08;View&#xff09;和视图模型&#xff08;ViewModel&#xff09;三者进行分离&#xff0c;使得开发者可以更加专注于各自领域的开发。其中&#xff0c;Model负…

C++入门:从C语言到C++的过渡(2)

目录 1.缺省参数 1.1缺省参数的概念 1.2缺省参数的分类及使用 1.3注意点 2.函数重载 2.1函数重载的定义 2.2函数重载的情况分类 2.3注意 2.4函数名修饰规则 3.引用 3.1引用的概念 3.2注意事项 3.3常引用 4.4引用的使用场景 4.4.1作为函数的参数 4.4.2做函数返回…

计算机网络安全控制技术

1.防火墙技术 防火墙技术是近年来维护网络安全最重要的手段&#xff0c;但是防火墙不是万能的&#xff0c;需要配合其他安全措施来协同 2.加密技术 目前加密技术主要有两大类&#xff1a;对称加密和非对称加密 3.用户识别技术 核心是识别网络者是否是属于系统的合法用户 …

CSS基础(第二天)

Emmet语法 快速生成HTML结构语法 1. 生成标签 直接输入标签名 按tab键即可 比如 div 然后tab 键&#xff0c; 就可以生成 <div></div> 2. 如果想要生成多个相同标签 加上 * 就可以了 比如 div*3 就可以快速生成3个div 3. 如果有父子级关系的标签&#xff0c;可以…

进制转换【野路子改造】

非科班&#xff0c;一直都是自己的野路子&#xff0c;现在要回炉重造 十进制->二进制 基本思想&#xff1a; 开始写的&#xff08;80%&#xff09;&#xff1a; #include<stdio.h> using namespace std; int main(){ int n; scanf("%d",&n); int a[1…