书生·浦语大模型实战营之 OpenCompass大模型评测

news2024/9/29 23:40:18

书生·浦语大模型实战营之 OpenCompass :是骡子是马,拉出来溜溜

在这里插入图片描述

为什么要研究大模型的评测?

百家争鸣,百花齐放。

  • 首先,研究评测对于我们全面了解大型语言模型的优势和限制至关重要。尽管许多研究表明大型语言模型在多个通用任务上已经达到或超越了人类水平,但仍然存在质疑,即这些模型的能力是否只是对训练数据的记忆而非真正的理解。例如,即使只提供LeetCode题目编号而不提供具体信息,大型语言模型也能够正确输出答案,这暗示着训练数据可能存在污染现象。
  • 其次,研究评测有助于指导和改进人类与大型语言模型之间的协同交互。考虑到大型语言模型的最终服务对象是人类,为了更好地设计人机交互的新范式,我们有必要全面评估模型的各项能力。
  • 最后,研究评测可以帮助我们更好地规划大型语言模型未来的发展,并预防未知和潜在的风险。随着大型语言模型的不断演进,其能力也在不断增强。通过合理科学的评测机制,我们能够从进化的角度评估模型的能力,并提前预测潜在的风险,这是至关重要的研究内容。

OpenCompass介绍

上海人工智能实验室科学家团队正式发布了大模型开源开放评测体系 “司南” (OpenCompass2.0),用于为大语言模型、多模态模型等提供一站式评测服务。其主要特点如下:

  • 开源可复现:提供公平、公开、可复现的大模型评测方案
  • 全面的能力维度:五大维度设计,提供 70+ 个数据集约 40 万题的的模型评测方案,全面评估模型能力
  • 丰富的模型支持:已支持 20+ HuggingFace 及 API 模型
  • 分布式高效评测:一行命令实现任务分割和分布式评测,数小时即可完成千亿模型全量评测
  • 多样化评测范式:支持零样本、小样本及思维链评测,结合标准型或对话型提示词模板,轻松激发各种模型最大性能
  • 灵活化拓展:想增加新模型或数据集?想要自定义更高级的任务分割策略,甚至接入新的集群管理系统?OpenCompass 的一切均可轻松扩展!

评测对象

本算法库的主要评测对象为语言大模型与多模态大模型。我们以语言大模型为例介绍评测的具体模型类型。

  • 基座模型:一般是经过海量的文本数据以自监督学习的方式进行训练获得的模型(如OpenAI的GPT-3,Meta的LLaMA),往往具有强大的文字续写能力。
  • 对话模型:一般是在的基座模型的基础上,经过指令微调或人类偏好对齐获得的模型(如OpenAI的ChatGPT、上海人工智能实验室的书生·浦语),能理解人类指令,具有较强的对话能力。

工具架构

在这里插入图片描述

  • 模型层:大模型评测所涉及的主要模型种类,OpenCompass以基座模型和对话模型作为重点评测对象。
  • 能力层:OpenCompass从本方案从通用能力和特色能力两个方面来进行评测维度设计。在模型通用能力方面,从语言、知识、理解、推理、安全等多个能力维度进行评测。在特色能力方面,从长文本、代码、工具、知识增强等维度进行评测。
  • 方法层:OpenCompass采用客观评测与主观评测两种评测方式。客观评测能便捷地评估模型在具有确定答案(如选择,填空,封闭式问答等)的任务上的能力,主观评测能评估用户对模型回复的真实满意度,OpenCompass采用基于模型辅助的主观评测和基于人类反馈的主观评测两种方式。
  • 工具层:OpenCompass提供丰富的功能支持自动化地开展大语言模型的高效评测。包括分布式评测技术,提示词工程,对接评测数据库,评测榜单发布,评测报告生成等诸多功能。

设计思路

为准确、全面、系统化地评估大语言模型的能力,OpenCompass从通用人工智能的角度出发,结合学术界的前沿进展和工业界的最佳实践,提出一套面向实际应用的模型能力评价体系。OpenCompass能力维度体系涵盖通用能力和特色能力两大部分。

评测方法

OpenCompass采取客观评测与主观评测相结合的方法。针对具有确定性答案的能力维度和场景,通过构造丰富完善的评测集,对模型能力进行综合评价。针对体现模型能力的开放式或半开放式的问题、模型安全问题等,采用主客观相结合的评测方式。

客观评测

针对具有标准答案的客观问题,我们可以我们可以通过使用定量指标比较模型的输出与标准答案的差异,并根据结果衡量模型的性能。同时,由于大语言模型输出自由度较高,在评测阶段,我们需要对其输入和输出作一定的规范和设计,尽可能减少噪声输出在评测阶段的影响,才能对模型的能力有更加完整和客观的评价。 为了更好地激发出模型在题目测试领域的能力,并引导模型按照一定的模板输出答案,OpenCompass采用提示词工程 (prompt engineering)和语境学习(in-context learning)进行客观评测。 在客观评测的具体实践中,我们通常采用下列两种方式进行模型输出结果的评测:

  • 判别式评测:该评测方式基于将问题与候选答案组合在一起,计算模型在所有组合上的困惑度(perplexity),并选择困惑度最小的答案作为模型的最终输出。例如,若模型在 问题? 答案1 上的困惑度为 0.1,在 问题? 答案2 上的困惑度为 0.2,最终我们会选择 答案1 作为模型的输出。
  • 生成式评测:该评测方式主要用于生成类任务,如语言翻译、程序生成、逻辑分析题等。具体实践时,使用问题作为模型的原始输入,并留白答案区域待模型进行后续补全。我们通常还需要对其输出进行后处理,以保证输出满足数据集的要求。

主观评测

语言表达生动精彩,变化丰富,大量的场景和能力无法凭借客观指标进行评测。针对如模型安全和模型语言能力的评测,以人的主观感受为主的评测更能体现模型的真实能力,并更符合大模型的实际使用场景。 OpenCompass采取的主观评测方案是指借助受试者的主观判断对具有对话能力的大语言模型进行能力评测。在具体实践中,我们提前基于模型的能力维度构建主观测试问题集合,并将不同模型对于同一问题的不同回复展现给受试者,收集受试者基于主观感受的评分。由于主观测试成本高昂,本方案同时也采用使用性能优异的大语言模拟人类进行主观打分。在实际评测中,本文将采用真实人类专家的主观评测与基于模型打分的主观评测相结合的方式开展模型能力评估。 在具体开展主观评测时,OpenComapss采用单模型回复满意度统计和多模型满意度比较两种方式开展具体的评测工作。

快速开始

在这里插入图片描述

概览

在 OpenCompass 中评估一个模型通常包括以下几个阶段:配置 -> 推理 -> 评估 -> 可视化。

  • 配置:这是整个工作流的起点。您需要配置整个评估过程,选择要评估的模型和数据集。此外,还可以选择评估策略、计算后端等,并定义显示结果的方式。
  • 推理与评估:在这个阶段,OpenCompass 将会开始对模型和数据集进行并行推理和评估。推理阶段主要是让模型从数据集产生输出,而评估阶段则是衡量这些输出与标准答案的匹配程度。这两个过程会被拆分为多个同时运行的“任务”以提高效率,但请注意,如果计算资源有限,这种策略可能会使评测变得更慢。如果需要了解该问题及解决方案,可以参考 FAQ: 效率。
  • 可视化:评估完成后,OpenCompass 将结果整理成易读的表格,并将其保存为 CSV 和 TXT 文件。你也可以激活飞书状态上报功能,此后可以在飞书客户端中及时获得评测状态报告。 接下来,我们将展示 OpenCompass 的基础用法,展示书生浦语在 C-Eval 基准任务上的评估。它们的配置文件可以在 configs/eval_demo.py 中找到。

环境配置

创建开发机和 conda 环境

在这里插入图片描述

面向GPU的环境安装

studio-conda -o internlm-base -t opencompass
source activate opencompass
git clone -b 0.2.4 https://github.com/open-compass/opencompass
cd opencompass
pip install -e .

在这里插入图片描述

pip install -r requirements.txt

在这里插入图片描述

数据准备

解压评测数据集到 data/ 处

cp /share/temp/datasets/OpenCompassData-core-20231110.zip /root/opencompass/
unzip OpenCompassData-core-20231110.zip

在opencompass下看到data文件夹
在这里插入图片描述

查看支持的数据集和模型

列出所有跟 internlm 及 ceval 相关的配置

 python tools/list_configs.py internlm ceval

(opencompass) root@intern-studio-061925:~/opencompass# python tools/list_configs.py internlm ceval
+----------------------------------------+----------------------------------------------------------------------+
| Model                                  | Config Path                                                          |
|----------------------------------------+----------------------------------------------------------------------|
| hf_internlm2_1_8b                      | configs/models/hf_internlm/hf_internlm2_1_8b.py                      |
| hf_internlm2_20b                       | configs/models/hf_internlm/hf_internlm2_20b.py                       |
| hf_internlm2_7b                        | configs/models/hf_internlm/hf_internlm2_7b.py                        |
| hf_internlm2_base_20b                  | configs/models/hf_internlm/hf_internlm2_base_20b.py                  |
| hf_internlm2_base_7b                   | configs/models/hf_internlm/hf_internlm2_base_7b.py                   |
| hf_internlm2_chat_1_8b                 | configs/models/hf_internlm/hf_internlm2_chat_1_8b.py                 |
| hf_internlm2_chat_1_8b_sft             | configs/models/hf_internlm/hf_internlm2_chat_1_8b_sft.py             |
| hf_internlm2_chat_20b                  | configs/models/hf_internlm/hf_internlm2_chat_20b.py                  |
| hf_internlm2_chat_20b_sft              | configs/models/hf_internlm/hf_internlm2_chat_20b_sft.py              |
| hf_internlm2_chat_20b_with_system      | configs/models/hf_internlm/hf_internlm2_chat_20b_with_system.py      |
| hf_internlm2_chat_7b                   | configs/models/hf_internlm/hf_internlm2_chat_7b.py                   |
| hf_internlm2_chat_7b_sft               | configs/models/hf_internlm/hf_internlm2_chat_7b_sft.py               |
| hf_internlm2_chat_7b_with_system       | configs/models/hf_internlm/hf_internlm2_chat_7b_with_system.py       |
| hf_internlm2_chat_math_20b             | configs/models/hf_internlm/hf_internlm2_chat_math_20b.py             |
| hf_internlm2_chat_math_20b_with_system | configs/models/hf_internlm/hf_internlm2_chat_math_20b_with_system.py |
| hf_internlm2_chat_math_7b              | configs/models/hf_internlm/hf_internlm2_chat_math_7b.py              |
| hf_internlm2_chat_math_7b_with_system  | configs/models/hf_internlm/hf_internlm2_chat_math_7b_with_system.py  |
| hf_internlm_20b                        | configs/models/hf_internlm/hf_internlm_20b.py                        |
| hf_internlm_7b                         | configs/models/hf_internlm/hf_internlm_7b.py                         |
| hf_internlm_chat_20b                   | configs/models/hf_internlm/hf_internlm_chat_20b.py                   |
| hf_internlm_chat_7b                    | configs/models/hf_internlm/hf_internlm_chat_7b.py                    |
| hf_internlm_chat_7b_8k                 | configs/models/hf_internlm/hf_internlm_chat_7b_8k.py                 |
| hf_internlm_chat_7b_v1_1               | configs/models/hf_internlm/hf_internlm_chat_7b_v1_1.py               |
| internlm_7b                            | configs/models/internlm/internlm_7b.py                               |
| lmdeploy_internlm2_chat_20b            | configs/models/hf_internlm/lmdeploy_internlm2_chat_20b.py            |
| lmdeploy_internlm2_chat_7b             | configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py             |
| ms_internlm_chat_7b_8k                 | configs/models/ms_internlm/ms_internlm_chat_7b_8k.py                 |
+----------------------------------------+----------------------------------------------------------------------+
+--------------------------------+------------------------------------------------------------------+
| Dataset                        | Config Path                                                      |
|--------------------------------+------------------------------------------------------------------|
| ceval_clean_ppl                | configs/datasets/ceval/ceval_clean_ppl.py                        |
| ceval_contamination_ppl_810ec6 | configs/datasets/contamination/ceval_contamination_ppl_810ec6.py |
| ceval_gen                      | configs/datasets/ceval/ceval_gen.py                              |
| ceval_gen_2daf24               | configs/datasets/ceval/ceval_gen_2daf24.py                       |
| ceval_gen_5f30c7               | configs/datasets/ceval/ceval_gen_5f30c7.py                       |
| ceval_internal_ppl_1cd8bf      | configs/datasets/ceval/ceval_internal_ppl_1cd8bf.py              |
| ceval_ppl                      | configs/datasets/ceval/ceval_ppl.py                              |
| ceval_ppl_1cd8bf               | configs/datasets/ceval/ceval_ppl_1cd8bf.py                       |
| ceval_ppl_578f8d               | configs/datasets/ceval/ceval_ppl_578f8d.py                       |
| ceval_ppl_93e5ce               | configs/datasets/ceval/ceval_ppl_93e5ce.py                       |
| ceval_zero_shot_gen_bd40ef     | configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py             |
+--------------------------------+------------------------------------------------------------------+
(opencompass) root@intern-studio-061925:~/opencompass#

在这里插入图片描述
在这里插入图片描述

启动评测 (10% A100 8GB 资源)

确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,可以通过以下命令评测 InternLM2-Chat-1.8B 模型在 C-Eval 数据集上的性能。由于 OpenCompass 默认并行启动评估过程,我们可以在第一次运行时以 --debug 模式启动评估,并检查是否存在问题。在 --debug 模式下,任务将按顺序执行,并实时打印输出。

python run.py --datasets ceval_gen --hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 --debug

命令解析

python run.py
--datasets ceval_gen \
--hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace 模型路径
--tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace tokenizer 路径(如果与模型路径相同,可以省略)
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 1024 \  # 模型可以接受的最大序列长度
--max-out-len 16 \  # 生成的最大 token 数
--batch-size 2  \  # 批量大小
--num-gpus 1  # 运行模型所需的 GPU 数量
--debug

在这里插入图片描述
遇到 问题 解决方案:
pip install protobuf
在这里插入图片描述
重新运行脚本:

v
遇到错误mkl-service + Intel® MKL MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 … 解决方案:

export MKL_SERVICE_FORCE_INTEL=1
#或
export MKL_THREADING_LAYER=GNU

在这里插入图片描述

重新运行,大模型评测结果如下:


(opencompass) root@intern-studio-061925:~/opencompass# export MKL_THREADING_LAYER=GNU
(opencompass) root@intern-studio-061925:~/opencompass#
(opencompass) root@intern-studio-061925:~/opencompass#
(opencompass) root@intern-studio-061925:~/opencompass#
(opencompass) root@intern-studio-061925:~/opencompass#
(opencompass) root@intern-studio-061925:~/opencompass# python run.py --datasets ceval_gen --hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 --debug
04/22 15:26:54 - OpenCompass - INFO - Loading ceval_gen: configs/datasets/ceval/ceval_gen.py
04/22 15:26:54 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
04/22 15:26:55 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
04/22 15:26:55 - OpenCompass - INFO - Partitioned into 1 tasks.
04/22 15:27:36 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_economics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-accountant,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-tax_accountant,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-physician,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-civil_servant,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-urban_and_rural_planner,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-teacher_qualification,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_programming,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-electrical_engineer,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-business_administration,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-art_studies,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-fire_engineer,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-environmental_impact_assessment_engineer,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-education_science,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-professional_tour_guide,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_chemistry,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-metrology_engineer,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-mao_zedong_thought,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-law,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-veterinary_medicine,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-modern_chinese_history,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-chinese_language_and_literature,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-legal_professional,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-logic,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_history,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-plant_protection,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-clinical_medicine,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_architecture,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_biology,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_politics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_chemistry,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_history,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_network,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-operating_system,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_physics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-advanced_mathematics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_physics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chemistry,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_biology,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_mathematics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_physics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-marxism,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_politics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_geography,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-ideological_and_moral_cultivation,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chinese,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-sports_science,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-basic_medicine,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-probability_and_statistics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_mathematics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-discrete_mathematics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_geography]
Loading checkpoint shards: 100%|████████████████████████████████████████████████| 2/2 [00:50<00:00, 25.22s/it]
04/22 15:30:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_economics]
100%|████████████████████████████████████████████████████████████████████| 55/55 [00:00<00:00, 1176973.06it/s]
[2024-04-22 15:30:28,693] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 28/28 [01:02<00:00,  2.24s/it]
04/22 15:31:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-accountant]
100%|████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 1447330.25it/s]
[2024-04-22 15:31:31,622] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 25/25 [01:05<00:00,  2.63s/it]
04/22 15:32:37 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-tax_accountant]
100%|████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 1208946.45it/s]
[2024-04-22 15:32:37,873] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 25/25 [01:05<00:00,  2.62s/it]
04/22 15:33:43 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-physician]
100%|█████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 856337.07it/s]
[2024-04-22 15:33:43,519] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 25/25 [00:42<00:00,  1.71s/it]
04/22 15:34:26 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-civil_servant]
100%|████████████████████████████████████████████████████████████████████| 47/47 [00:00<00:00, 1359533.02it/s]
[2024-04-22 15:34:26,580] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 24/24 [01:09<00:00,  2.90s/it]
04/22 15:35:36 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-urban_and_rural_planner]
100%|████████████████████████████████████████████████████████████████████| 46/46 [00:00<00:00, 1330606.79it/s]
[2024-04-22 15:35:36,320] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 23/23 [00:42<00:00,  1.85s/it]
04/22 15:36:19 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-teacher_qualification]
100%|████████████████████████████████████████████████████████████████████| 44/44 [00:00<00:00, 1408773.86it/s]
[2024-04-22 15:36:19,129] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 22/22 [00:37<00:00,  1.69s/it]
04/22 15:36:56 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_programming]
100%|█████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 952081.28it/s]
[2024-04-22 15:36:56,550] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 19/19 [00:41<00:00,  2.17s/it]
04/22 15:37:37 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-electrical_engineer]
100%|████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 1149549.99it/s]
[2024-04-22 15:37:37,913] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 19/19 [00:38<00:00,  2.02s/it]
04/22 15:38:16 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-business_administration]
100%|████████████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 1032925.61it/s]
[2024-04-22 15:38:16,512] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 17/17 [00:32<00:00,  1.90s/it]
04/22 15:38:48 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-art_studies]
100%|████████████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 1125301.07it/s]
[2024-04-22 15:38:48,984] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 17/17 [00:25<00:00,  1.48s/it]
04/22 15:39:14 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-fire_engineer]
100%|█████████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 977619.73it/s]
[2024-04-22 15:39:14,342] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
 69%|██████████████████████████████████████████████████▏                                                       75%|██████████████████████████████████████████████████████▊                                                   81%|███████████████████████████████████████████████████████████▎                                              88%|███████████████████████████████████████████████████████████████▉                                          94%|████████████████████████████████████████████████████████████████                                         100%|████████████████████████████████████████████████████████████████                                         100%|████████████████████████████████████████████████████████████████                                         █████████| 16/16 [00:34<00:00,  2.18s/it]
04/22 15:39:49 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-environmental_impact_assessment_engineer]
100%|████████████████████████████| 31/31 [00:00<00:00, 714414.42it/s]
[2024-04-22 15:39:49,343] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 16/16 [00:31<00:00,  1.99s/it]
04/22 15:40:21 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-education_science]
100%|████████████████████████████| 29/29 [00:00<00:00, 887845.37it/s]
[2024-04-22 15:40:21,321] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 15/15 [00:24<00:00,  1.65s/it]
04/22 15:40:46 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-professional_tour_guide]
100%|████████████████████████████| 29/29 [00:00<00:00, 800229.05it/s]
[2024-04-22 15:40:46,143] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 15/15 [00:30<00:00,  2.01s/it]
04/22 15:41:16 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-college_chemistry]
100%|████████████████████████████| 24/24 [00:00<00:00, 867787.03it/s]
[2024-04-22 15:41:16,371] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 12/12 [00:30<00:00,  2.56s/it]
04/22 15:41:47 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-metrology_engineer]
100%|████████████████████████████| 24/24 [00:00<00:00, 860370.05it/s]
[2024-04-22 15:41:47,324] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 12/12 [00:26<00:00,  2.21s/it]
04/22 15:42:14 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-mao_zedong_thought]
100%|████████████████████████████| 24/24 [00:00<00:00, 689474.63it/s]
[2024-04-22 15:42:14,099] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 12/12 [00:24<00:00,  2.01s/it]
04/22 15:42:38 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-law]
100%|█████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 906876.54it/s]
[2024-04-22 15:42:38,372] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:29<00:00,  2.50s/it]
04/22 15:43:08 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-veterinary_medicine]
100%|█████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 725330.77it/s]
[2024-04-22 15:43:08,540] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:20<00:00,  1.67s/it]
04/22 15:43:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-modern_chinese_history]
100%|█████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 853707.89it/s]
[2024-04-22 15:43:28,748] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:24<00:00,  2.08s/it]
04/22 15:43:53 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-chinese_language_and_literature]
100%|█████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 665303.39it/s]
[2024-04-22 15:43:53,825] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:19<00:00,  1.65s/it]
04/22 15:44:13 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-legal_professional]
100%|█████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 810663.80it/s]
[2024-04-22 15:44:13,833] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:37<00:00,  3.15s/it]
04/22 15:44:51 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-logic]
100%|█████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 846556.77it/s]
[2024-04-22 15:44:51,898] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:42<00:00,  3.84s/it]
04/22 15:45:34 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_history]
100%|█████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 775417.55it/s]
[2024-04-22 15:45:34,567] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:23<00:00,  2.18s/it]
04/22 15:45:58 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-plant_protection]
100%|█████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 595320.57it/s]
[2024-04-22 15:45:58,686] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:16<00:00,  1.53s/it]
04/22 15:46:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-clinical_medicine]
100%|█████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 795471.45it/s]
[2024-04-22 15:46:15,740] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:28<00:00,  2.59s/it]
04/22 15:46:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_architecture]
100%|█████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 587202.56it/s]
[2024-04-22 15:46:44,319] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:28<00:00,  2.60s/it]
04/22 15:47:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_biology]
100%|█████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 752823.79it/s]
[2024-04-22 15:47:13,049] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:20<00:00,  1.90s/it]
04/22 15:47:34 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_politics]
100%|█████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 579476.21it/s]
[2024-04-22 15:47:34,348] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:38<00:00,  3.49s/it]
04/22 15:48:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_chemistry]
100%|█████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 612307.15it/s]
[2024-04-22 15:48:12,963] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:31<00:00,  3.12s/it]
04/22 15:48:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_history]
100%|█████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 704925.04it/s]
[2024-04-22 15:48:44,261] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:20<00:00,  2.01s/it]
04/22 15:49:04 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_network]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 622592.00it/s]
[2024-04-22 15:49:04,556] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.58s/it]
04/22 15:49:20 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-operating_system]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 731117.21it/s]
[2024-04-22 15:49:20,460] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
 10%|███████▍                                                                                                  20%|██████████████▊                                                                                           30%|██████████████████████▏                                                                                   40%|█████████████████████████████▌                                                                            50%|█████████████████████████████████████                                                                     60%|████████████████████████████████████████████▍                                                             70%|███████████████████████████████████████████████████▊                                                      80%|███████████████████████████████████████████████████████████▏                                              90%|████████████████████████████████████████████████████████████████                                         100%|████████████████████████████████████████████████████████████████                                         100%|████████████████████████████████████████████████████████████████                                         █████████| 10/10 [00:13<00:00,  1.36s/it]
04/22 15:49:34 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-college_physics]
100%|████████████████████████████| 19/19 [00:00<00:00, 692971.97it/s]
[2024-04-22 15:49:34,308] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:26<00:00,  2.60s/it]
04/22 15:50:00 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-advanced_mathematics]
100%|████████████████████████████| 19/19 [00:00<00:00, 569226.97it/s]
[2024-04-22 15:50:00,577] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:31<00:00,  3.19s/it]
04/22 15:50:32 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-high_school_physics]
100%|████████████████████████████| 19/19 [00:00<00:00, 485925.46it/s]
[2024-04-22 15:50:32,703] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:17<00:00,  1.73s/it]
04/22 15:50:50 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-high_school_chemistry]
100%|████████████████████████████| 19/19 [00:00<00:00, 664098.13it/s]
[2024-04-22 15:50:50,151] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:18<00:00,  1.89s/it]
04/22 15:51:09 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-high_school_biology]
100%|████████████████████████████| 19/19 [00:00<00:00, 498073.60it/s]
[2024-04-22 15:51:09,228] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:19<00:00,  1.91s/it]
04/22 15:51:28 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-middle_school_mathematics]
100%|████████████████████████████| 19/19 [00:00<00:00, 608334.17it/s]
[2024-04-22 15:51:28,484] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:25<00:00,  2.52s/it]
04/22 15:51:53 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-middle_school_physics]
100%|████████████████████████████| 19/19 [00:00<00:00, 699050.67it/s]
[2024-04-22 15:51:53,900] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:18<00:00,  1.90s/it]
04/22 15:52:13 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-marxism]
100%|████████████████████████████| 19/19 [00:00<00:00, 504378.33it/s]
[2024-04-22 15:52:13,408] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:15<00:00,  1.56s/it]
04/22 15:52:29 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_politics]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 664098.13it/s]
[2024-04-22 15:52:29,316] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:38<00:00,  3.87s/it]
04/22 15:53:08 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_geography]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 744782.95it/s]
[2024-04-22 15:53:08,169] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:16<00:00,  1.67s/it]
04/22 15:53:24 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-ideological_and_moral_cultivation]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 681126.29it/s]
[2024-04-22 15:53:25,034] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00,  1.44s/it]
04/22 15:53:39 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chinese]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 705236.96it/s]
[2024-04-22 15:53:39,637] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:36<00:00,  3.60s/it]
04/22 15:54:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-sports_science]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 692971.97it/s]
[2024-04-22 15:54:15,783] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.58s/it]
04/22 15:54:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-basic_medicine]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 681126.29it/s]
[2024-04-22 15:54:31,735] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:16<00:00,  1.66s/it]
04/22 15:54:48 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-probability_and_statistics]
100%|█████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 645277.54it/s]
[2024-04-22 15:54:48,620] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████| 9/9 [00:39<00:00,  4.43s/it]
04/22 15:55:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_mathematics]
100%|█████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 656499.76it/s]
[2024-04-22 15:55:28,709] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████| 9/9 [00:32<00:00,  3.60s/it]
04/22 15:56:01 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-discrete_mathematics]
100%|█████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 524288.00it/s]
[2024-04-22 15:56:01,183] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████| 8/8 [00:14<00:00,  1.78s/it]
04/22 15:56:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_geography]
100%|█████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 479349.03it/s]
[2024-04-22 15:56:15,577] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████| 6/6 [00:11<00:00,  1.95s/it]
04/22 15:56:27 - OpenCompass - INFO - time elapsed: 1730.61s
04/22 15:56:47 - OpenCompass - INFO - Partitioned into 52 tasks.
04/22 15:56:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_network]: {'accuracy': 47.368421052631575}
04/22 15:56:52 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-operating_system]: {'accuracy': 47.368421052631575}
04/22 15:56:54 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_architecture]: {'accuracy': 23.809523809523807}
04/22 15:56:56 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_programming]: {'accuracy': 13.513513513513514}
04/22 15:56:59 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_physics]: {'accuracy': 42.10526315789473}
04/22 15:57:01 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_chemistry]: {'accuracy': 33.33333333333333}
04/22 15:57:03 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-advanced_mathematics]: {'accuracy': 10.526315789473683}
04/22 15:57:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-probability_and_statistics]: {'accuracy': 38.88888888888889}
04/22 15:57:08 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-discrete_mathematics]: {'accuracy': 25.0}
04/22 15:57:10 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-electrical_engineer]: {'accuracy': 27.027027027027028}
04/22 15:57:12 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-metrology_engineer]: {'accuracy': 54.166666666666664}
04/22 15:57:14 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_mathematics]: {'accuracy': 16.666666666666664}
04/22 15:57:17 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_physics]: {'accuracy': 42.10526315789473}
04/22 15:57:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chemistry]: {'accuracy': 47.368421052631575}
04/22 15:57:21 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_biology]: {'accuracy': 26.31578947368421}
04/22 15:57:23 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_mathematics]: {'accuracy': 36.84210526315789}
04/22 15:57:25 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_biology]: {'accuracy': 80.95238095238095}
04/22 15:57:28 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_physics]: {'accuracy': 47.368421052631575}
04/22 15:57:30 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_chemistry]: {'accuracy': 80.0}
04/22 15:57:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-veterinary_medicine]: {'accuracy': 43.47826086956522}
04/22 15:57:35 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_economics]: {'accuracy': 32.72727272727273}
04/22 15:57:38 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-business_administration]: {'accuracy': 36.36363636363637}
04/22 15:57:40 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-marxism]: {'accuracy': 68.42105263157895}
04/22 15:57:43 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-mao_zedong_thought]: {'accuracy': 70.83333333333334}
04/22 15:57:45 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-education_science]: {'accuracy': 55.172413793103445}
04/22 15:57:48 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-teacher_qualification]: {'accuracy': 59.09090909090909}
04/22 15:57:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_politics]: {'accuracy': 57.89473684210527}
04/22 15:57:53 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_geography]: {'accuracy': 47.368421052631575}
04/22 15:57:55 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_politics]: {'accuracy': 71.42857142857143}
04/22 15:57:58 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_geography]: {'accuracy': 75.0}
04/22 15:58:00 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-modern_chinese_history]: {'accuracy': 52.17391304347826}
04/22 15:58:02 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-ideological_and_moral_cultivation]: {'accuracy': 73.68421052631578}
04/22 15:58:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-logic]: {'accuracy': 27.27272727272727}
04/22 15:58:07 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-law]: {'accuracy': 29.166666666666668}
04/22 15:58:09 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-chinese_language_and_literature]: {'accuracy': 47.82608695652174}
04/22 15:58:11 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-art_studies]: {'accuracy': 42.42424242424242}
04/22 15:58:14 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-professional_tour_guide]: {'accuracy': 51.724137931034484}
04/22 15:58:16 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-legal_professional]: {'accuracy': 34.78260869565217}
04/22 15:58:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chinese]: {'accuracy': 42.10526315789473}
04/22 15:58:21 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_history]: {'accuracy': 65.0}
04/22 15:58:23 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_history]: {'accuracy': 86.36363636363636}
04/22 15:58:25 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-civil_servant]: {'accuracy': 42.5531914893617}
04/22 15:58:28 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-sports_science]: {'accuracy': 52.63157894736842}
04/22 15:58:30 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-plant_protection]: {'accuracy': 40.909090909090914}
04/22 15:58:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-basic_medicine]: {'accuracy': 68.42105263157895}
04/22 15:58:35 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-clinical_medicine]: {'accuracy': 31.818181818181817}
04/22 15:58:37 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-urban_and_rural_planner]: {'accuracy': 47.82608695652174}
04/22 15:58:40 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-accountant]: {'accuracy': 36.734693877551024}
04/22 15:58:42 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-fire_engineer]: {'accuracy': 38.70967741935484}
04/22 15:58:44 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-environmental_impact_assessment_engineer]: {'accuracy': 51.61290322580645}
04/22 15:58:47 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-tax_accountant]: {'accuracy': 36.734693877551024}
04/22 15:58:49 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-physician]: {'accuracy': 42.857142857142854}
dataset                                         version    metric         mode      opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b
----------------------------------------------  ---------  -------------  ------  ---------------------------------------------------------------------------------------
ceval-computer_network                          db9ce2     accuracy       gen                                                                                       47.37
ceval-operating_system                          1c2571     accuracy       gen                                                                                       47.37
ceval-computer_architecture                     a74dad     accuracy       gen                                                                                       23.81
ceval-college_programming                       4ca32a     accuracy       gen                                                                                       13.51
ceval-college_physics                           963fa8     accuracy       gen                                                                                       42.11
ceval-college_chemistry                         e78857     accuracy       gen                                                                                       33.33
ceval-advanced_mathematics                      ce03e2     accuracy       gen                                                                                       10.53
ceval-probability_and_statistics                65e812     accuracy       gen                                                                                       38.89
ceval-discrete_mathematics                      e894ae     accuracy       gen                                                                                       25
ceval-electrical_engineer                       ae42b9     accuracy       gen                                                                                       27.03
ceval-metrology_engineer                        ee34ea     accuracy       gen                                                                                       54.17
ceval-high_school_mathematics                   1dc5bf     accuracy       gen                                                                                       16.67
ceval-high_school_physics                       adf25f     accuracy       gen                                                                                       42.11
ceval-high_school_chemistry                     2ed27f     accuracy       gen                                                                                       47.37
ceval-high_school_biology                       8e2b9a     accuracy       gen                                                                                       26.32
ceval-middle_school_mathematics                 bee8d5     accuracy       gen                                                                                       36.84
ceval-middle_school_biology                     86817c     accuracy       gen                                                                                       80.95
ceval-middle_school_physics                     8accf6     accuracy       gen                                                                                       47.37
ceval-middle_school_chemistry                   167a15     accuracy       gen                                                                                       80
ceval-veterinary_medicine                       b4e08d     accuracy       gen                                                                                       43.48
ceval-college_economics                         f3f4e6     accuracy       gen                                                                                       32.73
ceval-business_administration                   c1614e     accuracy       gen                                                                                       36.36
ceval-marxism                                   cf874c     accuracy       gen                                                                                       68.42
ceval-mao_zedong_thought                        51c7a4     accuracy       gen                                                                                       70.83
ceval-education_science                         591fee     accuracy       gen                                                                                       55.17
ceval-teacher_qualification                     4e4ced     accuracy       gen                                                                                       59.09
ceval-high_school_politics                      5c0de2     accuracy       gen                                                                                       57.89
ceval-high_school_geography                     865461     accuracy       gen                                                                                       47.37
ceval-middle_school_politics                    5be3e7     accuracy       gen                                                                                       71.43
ceval-middle_school_geography                   8a63be     accuracy       gen                                                                                       75
ceval-modern_chinese_history                    fc01af     accuracy       gen                                                                                       52.17
ceval-ideological_and_moral_cultivation         a2aa4a     accuracy       gen                                                                                       73.68
ceval-logic                                     f5b022     accuracy       gen                                                                                       27.27
ceval-law                                       a110a1     accuracy       gen                                                                                       29.17
ceval-chinese_language_and_literature           0f8b68     accuracy       gen                                                                                       47.83
ceval-art_studies                               2a1300     accuracy       gen                                                                                       42.42
ceval-professional_tour_guide                   4e673e     accuracy       gen                                                                                       51.72
ceval-legal_professional                        ce8787     accuracy       gen                                                                                       34.78
ceval-high_school_chinese                       315705     accuracy       gen                                                                                       42.11
ceval-high_school_history                       7eb30a     accuracy       gen                                                                                       65
ceval-middle_school_history                     48ab4a     accuracy       gen                                                                                       86.36
ceval-civil_servant                             87d061     accuracy       gen                                                                                       42.55
ceval-sports_science                            70f27b     accuracy       gen                                                                                       52.63
ceval-plant_protection                          8941f9     accuracy       gen                                                                                       40.91
ceval-basic_medicine                            c409d6     accuracy       gen                                                                                       68.42
ceval-clinical_medicine                         49e82d     accuracy       gen                                                                                       31.82
ceval-urban_and_rural_planner                   95b885     accuracy       gen                                                                                       47.83
ceval-accountant                                002837     accuracy       gen                                                                                       36.73
ceval-fire_engineer                             bc23f5     accuracy       gen                                                                                       38.71
ceval-environmental_impact_assessment_engineer  c64e2d     accuracy       gen                                                                                       51.61
ceval-tax_accountant                            3a5e3c     accuracy       gen                                                                                       36.73
ceval-physician                                 6e277d     accuracy       gen                                                                                       42.86
ceval-stem                                      -          naive_average  gen                                                                                       39.21
ceval-social-science                            -          naive_average  gen                                                                                       57.43
ceval-humanities                                -          naive_average  gen                                                                                       50.23
ceval-other                                     -          naive_average  gen                                                                                       44.62
ceval-hard                                      -          naive_average  gen                                                                                       32
ceval                                           -          naive_average  gen                                                                                       46.19
04/22 15:58:49 - OpenCompass - INFO - write summary to /root/opencompass/outputs/default/20240422_152654/summary/summary_20240422_152654.txt
04/22 15:58:49 - OpenCompass - INFO - write csv to /root/opencompass/outputs/default/20240422_152654/summary/summary_20240422_152654.csv
(opencompass) root@intern-studio-061925:~/opencompass#

在这里插入图片描述

大海捞针:星辰藏海深,字海寻珠难

大海捞针测试(灵感来自 NeedleInAHaystack)是指通过将关键信息随机插入一段长文本的不同位置,形成大语言模型 (LLM) 的Prompt,通过测试大模型是否能从长文本中提取出关键信息,从而测试大模型的长文本信息提取能力的一种方法,可反映LLM长文本理解的基本能力。

GPT-4 Turbo(128K)在语料长度超过 72K 且句子(“针”)藏在文本头部的时候,准确率不佳。
在这里插入图片描述

https://github.com/gkamradt/LLMTest_NeedleInAHaystack/tree/main
Claude 2.1似乎在语料长度超过 20K 之后就开始准确率不佳,而且句子(“针”)藏在语料靠前的位置时,准确率尤其差

在这里插入图片描述

https://github.com/gkamradt/LLMTest_NeedleInAHaystack/blob/main/viz/CreateVisFromLangSmithTesting.ipynb
v

Kimi Chat 公布“大海捞针”长文本压测结果
https://mp.weixin.qq.com/s?__biz=Mzk0NDU1MDkyNg==&mid=2247483766&idx=1&sn=8754ec4138905dd12c44d321957f0956&chksm=c323a417f4542d01106c1821c7b7fac4c9b0a55e9f97f72a0f3d8dec4359bf94e4257660ff1a&mpshare=1&scene=23&srcid=0126SmQjMKBNs9wTvHOJDAxi&sharer_shareinfo=52aa3b48e79441dd8d77643f5c91fe7f&sharer_shareinfo_first=a2f6f776598a5f3f32ca224e4ef5f5e3#rd

在这里插入图片描述

数据集介绍

Skywork/ChineseDomainModelingEval 数据集收录了 2023 年 9 月至 10 月期间发布的高质量中文文章,涵盖了多个领域。这些文章确保了公平且具有挑战性的基准测试。 该数据集包括特定领域的文件:

  • zh_finance.jsonl 金融
  • zh_game.jsonl 游戏
  • zh_government.jsonl 政务
  • zh_movie.jsonl 电影
  • zh_tech.jsonl 技术
  • zh_general.jsonl 综合 这些文件用于评估LLM对不同特定领域的理解能力

评估步骤

  • 从 Skywork/ChineseDomainModelingEval 下载数据集

在这里插入图片描述

  • 将下载的文件放置在 opencompass/data/CDME/ 下。CDME 目录中的预期文件结构如下
  • 在这里插入图片描述

配置数据集

在最新版本中,数据集不再通过运行脚本手动生成,而是通过在配置文件中动态定义和加载。用户需要根据自己的需求,在配置文件中指定数据集的参数。这种方法提供了更大的灵活性和定制化选项。

数据集配置示例

以下是一个数据集配置的示例,展示了如何在配置文件 configs/datasets/cdme/cdme8k.py 中定义一个数据集。这个示例展示了一个 8000 tokens 长度的中文数据集配置

for original_context_length in context_lengths:
    for depth_percent in generate_depth_percents(
            document_depth_percent_intervals,
            document_depth_percent_interval_type):
        dataset_dict = {
            'abbr': f'CDME_Length{original_context_length}Depth{int(depth_percent)}',
            'type': CDMEDataset,
            'path': base_path,
            'length': original_context_length,
            'depth': int(depth_percent),
            'tokenizer_model': 'gpt-4',
            'file_list': file_list,
            'num_repeats_per_file': 10,
            'length_buffer': 200,
            'guide': True,
            'language': 'Chinese',
            'needle': '\n小明最喜欢的实习的地点就是上海人工智能实验室。\n',
            'retrieval_question': '小明最喜欢的实习地点是哪里?请按照“小明最喜欢的实习地点就是________。”的格式回答。',
            'reader_cfg': cdme_reader_cfg,
            'infer_cfg': cdme_infer_cfg,
            'eval_cfg': cdme_eval_cfg
        }
        cdme_datasets.append(dataset_dict)

在这个配置中,主要参数包括:

abbr: 数据集的简称。

type: 数据集类型。

path: 数据集文件的路径。

length: 上下文长度(以token为单位)。

depth: 文档深度百分比。

tokenizer_model: 使用的tokenizer 模型。

file_list: 数据源文件列表。

num_repeats_per_file: 每个文件重复的次数。

length_buffer: 长度缓冲区。

guide: 是否为引导式数据集。

language: 数据集的语言。

needle: 在数据集中要查找的特定文本(针)。

retrieval_question: 用于提示模型检索的问题。

reader_cfg, infer_cfg, eval_cfg: 分别对应读取、推理和评估的配置。

通过在配置文件中定义这些参数,您可以灵活地创建适合您需求的数据集。配置文件提供了一种高度可定制和扩展的方式来管理数据集的生成和使用

使用 internlm 模型进行评估,可以使用以下命令

python run.py configs/eval_needleinahaystack.py --slurm -p partition_name -q auto --max-num-workers 32 

python run.py configs/eval_needlebench.py --slurm -p partition_name -q auto --max-num-workers 32

大模型技术分享

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座

模块一:Generative AI 原理本质、技术内核及工程实践周期详解
模块二:工业级 Prompting 技术内幕及端到端的基于LLM 的会议助理实战
模块三:三大 Llama 2 模型详解及实战构建安全可靠的智能对话系统
模块四:生产环境下 GenAI/LLMs 的五大核心问题及构建健壮的应用实战
模块五:大模型应用开发技术:Agentic-based 应用技术及案例实战
模块六:LLM 大模型微调及模型 Quantization 技术及案例实战
模块七:大模型高效微调 PEFT 算法、技术、流程及代码实战进阶
模块八:LLM 模型对齐技术、流程及进行文本Toxicity 分析实战
模块九:构建安全的 GenAI/LLMs 核心技术Red Teaming 解密实战
模块十:构建可信赖的企业私有安全大模型Responsible AI 实战 

Llama3关键技术深度解析与构建Responsible AI、算法及开发落地实战

1、Llama开源模型家族大模型技术、工具和多模态详解:学员将深入了解Meta Llama 3的创新之处,比如其在语言模型技术上的突破,并学习到如何在Llama 3中构建trust and safety AI。他们将详细了解Llama 3的五大技术分支及工具,以及如何在AWS上实战Llama指令微调的案例。
2、解密Llama 3 Foundation Model模型结构特色技术及代码实现:深入了解Llama 3中的各种技术,比如Tiktokenizer、KV Cache、Grouped Multi-Query Attention等。通过项目二逐行剖析Llama 3的源码,加深对技术的理解。
3、解密Llama 3 Foundation Model模型结构核心技术及代码实现:SwiGLU Activation Function、FeedForward Block、Encoder Block等。通过项目三学习Llama 3的推理及Inferencing代码,加强对技术的实践理解。
4、基于LangGraph on Llama 3构建Responsible AI实战体验:通过项目四在Llama 3上实战基于LangGraph的Responsible AI项目。他们将了解到LangGraph的三大核心组件、运行机制和流程步骤,从而加强对Responsible AI的实践能力。
5、Llama模型家族构建技术构建安全可信赖企业级AI应用内幕详解:深入了解构建安全可靠的企业级AI应用所需的关键技术,比如Code Llama、Llama Guard等。项目五实战构建安全可靠的对话智能项目升级版,加强对安全性的实践理解。
6、Llama模型家族Fine-tuning技术与算法实战:学员将学习Fine-tuning技术与算法,比如Supervised Fine-Tuning(SFT)、Reward Model技术、PPO算法、DPO算法等。项目六动手实现PPO及DPO算法,加强对算法的理解和应用能力。
7、Llama模型家族基于AI反馈的强化学习技术解密:深入学习Llama模型家族基于AI反馈的强化学习技术,比如RLAIF和RLHF。项目七实战基于RLAIF的Constitutional AI。
8、Llama 3中的DPO原理、算法、组件及具体实现及算法进阶:学习Llama 3中结合使用PPO和DPO算法,剖析DPO的原理和工作机制,详细解析DPO中的关键算法组件,并通过综合项目八从零开始动手实现和测试DPO算法,同时课程将解密DPO进阶技术Iterative DPO及IPO算法。
9、Llama模型家族Safety设计与实现:在这个模块中,学员将学习Llama模型家族的Safety设计与实现,比如Safety in Pretraining、Safety Fine-Tuning等。构建安全可靠的GenAI/LLMs项目开发。
10、Llama 3构建可信赖的企业私有安全大模型Responsible AI系统:构建可信赖的企业私有安全大模型Responsible AI系统,掌握Llama 3的Constitutional AI、Red Teaming。

解码Sora架构、技术及应用

一、为何Sora通往AGI道路的里程碑?
1,探索从大规模语言模型(LLM)到大规模视觉模型(LVM)的关键转变,揭示其在实现通用人工智能(AGI)中的作用。
2,展示Visual Data和Text Data结合的成功案例,解析Sora在此过程中扮演的关键角色。
3,详细介绍Sora如何依据文本指令生成具有三维一致性(3D consistency)的视频内容。 4,解析Sora如何根据图像或视频生成高保真内容的技术路径。
5,探讨Sora在不同应用场景中的实践价值及其面临的挑战和局限性。

二、解码Sora架构原理
1,DiT (Diffusion Transformer)架构详解
2,DiT是如何帮助Sora实现Consistent、Realistic、Imaginative视频内容的?
3,探讨为何选用Transformer作为Diffusion的核心网络,而非技术如U-Net。
4,DiT的Patchification原理及流程,揭示其在处理视频和图像数据中的重要性。
5,Conditional Diffusion过程详解,及其在内容生成过程中的作用。
三、解码Sora关键技术解密
1,Sora如何利用Transformer和Diffusion技术理解物体间的互动,及其对模拟复杂互动场景的重要性。
2,为何说Space-time patches是Sora技术的核心,及其对视频生成能力的提升作用。
3,Spacetime latent patches详解,探讨其在视频压缩和生成中的关键角色。
4,Sora Simulator如何利用Space-time patches构建digital和physical世界,及其对模拟真实世界变化的能力。
5,Sora如何实现faithfully按照用户输入文本而生成内容,探讨背后的技术与创新。
6,Sora为何依据abstract concept而不是依据具体的pixels进行内容生成,及其对模型生成质量与多样性的影响。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1662681.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

GA-CNN-LSTM多输入分类|遗传算法-卷积-长短期神经网络|Matlab

目录 一、程序及算法内容介绍&#xff1a; 基本内容&#xff1a; 亮点与优势&#xff1a; 二、实际运行效果&#xff1a; 三、算法介绍&#xff1a; 四、完整程序下载&#xff1a; 一、程序及算法内容介绍&#xff1a; 基本内容&#xff1a; 本代码基于Matlab平台编译&am…

2023年国赛高教杯数学建模C题蔬菜类商品的自动定价与补货决策解题全过程文档及程序

2023年国赛高教杯数学建模 C题 蔬菜类商品的自动定价与补货决策 原题再现 在生鲜商超中&#xff0c;一般蔬菜类商品的保鲜期都比较短&#xff0c;且品相随销售时间的增加而变差&#xff0c;大部分品种如当日未售出&#xff0c;隔日就无法再售。因此&#xff0c;商超通常会根据…

数字型隔离器ISO121x的用法

目录 概述 1 认识ISO121x 1.1 简介 1.2 特性 1.3 应用领域 2 ISO121x芯片结构 2.1 ISO1211引脚介绍 2.2 ISO1211的通用应用电路 2.3 Layout Example 3 应用范例 3.1 TI提供的评估板 3.2 评估板的原理图电路 概述 本文主要介绍ISO121x的相关特性&#xff0c;以及其…

C++随手写一个打字练习软件TL(TypeLetters)附原码

C随手写一个打字练习软件TL&#xff08;TypeLetters&#xff09;附原码 说明 软件名称&#xff1a;TL&#xff08;TypeLetters&#xff09; 开发语言&#xff1a;C 适合人群&#xff1a;零基础小白或C学习者 软件功能&#xff1a;打字练习软件TL&#xff08;TypeLetters&#…

与队列和栈相关的【OJ题】

✨✨✨专栏&#xff1a;数据结构 &#x1f9d1;‍&#x1f393;个人主页&#xff1a;SWsunlight 目录 一、用队列实现栈&#xff1a; 1、2个队列的关联起来怎么由先进先出转变为先进后出&#xff1a;&#xff08;核心&#xff09; 2、认识各个函数干嘛用的&#xff1a; …

【Linux】什么是进程?

一个正在执行的程序&#xff0c;我们称之为进程。 然后我们来顺着一条线来思考。 操作系统底层是用C语言编写的&#xff0c;而我们的进程&#xff0c;它会有各种属性&#xff0c;那么各种属性就可以用一个结构体来对进程的各个属性进行描述&#xff0c;然后这个结构体里面&…

C语言 6 函数

目录 1. 函数的概念 2. 库函数 标准库和头文件 库函数的使用方法 库函数⽂档的一般格式 3. 自定义函数 函数的语法形式 函数的举例 4. 形参和实参 实参 形参 实参和形参的关系 5. return语句 6. 数组做函数参数 7. 嵌套调用和链式访问 嵌套调用 链式访问 8. 函数的声明和定义 单…

Spring MVC分页示例

Spring MVC分页示例 分页用于在不同部分显示大量记录。在这种情况下&#xff0c;我们将在一页中显示10、20或50条记录。对于其余记录&#xff0c;我们提供链接。 我们可以在Spring MVC中简单地创建分页示例。在此分页示例中&#xff0c;我们使用MySQL数据库来获取记录。 创建…

Python爬虫实战:爬取【某旅游交通出行类网站中国内热门景点】的评论数据,使用Re、BeautifulSoup与Xpath三种方式解析数据,代码完整

一、分析爬取网页&#xff1a; 1、网址 https://travel.qunar.com/2、 打开网站&#xff0c;找到要爬取的网页 https://travel.qunar.com/p-cs299979-chongqing进来之后&#xff0c;找到评论界面&#xff0c;如下所示&#xff1a;在这里我选择驴友点评数据爬取点击【驴友点评…

【机器学习】 人工智能和机器学习辅助决策在空战中的未来选择

&#x1f680;传送门 &#x1f680;文章引言&#x1f512;技术层面&#x1f4d5;作战结构&#x1f308;替代决策选项&#x1f3ac;选项 1&#xff1a;超级战争&#xff08;Hyperwar&#xff09;&#x1f320;选项 2&#xff1a;超越OODA&#x1f302;选项 3&#xff1a;阻止其他…

Linux 认识与学习Bash——3

在Linux bash中&#xff0c;数据流重定向是指将命令的输出从默认的标准输出&#xff08;通常是终端&#xff09;重定向到其他位置&#xff0c;如文件或另一个命令的输入。这是通过使用特定的符号来实现的。例如&#xff0c;>用于将输出重定向到文件&#xff0c;而<用于将…

使用 AI Assistant for Observability 和组织的运行手册增强 SRE 故障排除

作者&#xff1a;Almudena Sanz Oliv, Katrin Freihofner, Tom Grabowski 通过本指南&#xff0c;你的 SRE 团队可以实现增强的警报修复和事件管理。 可观测性 AI 助手可帮助用户使用自然语言界面探索和分析可观测性数据&#xff0c;利用自动函数调用来请求、分析和可视化数据…

【35分钟掌握金融风控策略18】贷前风控策略详解-3

目录 ​编辑 贷前风控数据源 第三方数据 贷前风控数据源 第三方数据 在金融风控过程中&#xff0c;金融机构通常会引入一些第三方的风控数据&#xff08;或第三方金融技术&#xff09;来辅助识别贷款个人或贷款企业的风险状况&#xff0c;帮助金融机构进行风控决策&#x…

MySQL·表的内外连接

目录 表的内连和外连 内连接 案例1&#xff1a;显示SMITH的名字和部门名 外连接 左外连接 案例2&#xff1a; 查询所有学生的成绩&#xff0c;如果这个学生没有成绩&#xff0c;也要将学生的个人信息显示出来 右外连接 案例3&#xff1a;对stu表和exam表联合查询&#…

【话题】你用过最好用的AI工具有那些

大家好&#xff0c;我是全栈小5&#xff0c;欢迎阅读小5的系列文章&#xff0c;这是《话题》系列文章 目录 背景一、C知道二、CSDN工具集三、AI工具的普及与受欢迎程度四、AI工具的实际应用与影响五、总结与展望文章推荐 背景 探讨人们在使用AI工具时&#xff0c;最喜欢的和认…

栈和队列初级题目(包含四个题)

目录 一、原题链接&#xff1a; 二、有效的括号&#xff1a; ​编辑代码实现&#xff1a; 三、用队列实现栈&#xff1a; 四、用栈实现队列&#xff1a; 五、设计循环队列&#xff1a; 六、读书分享&#xff1a; 一、原题链接&#xff1a; 20. 有效的括号 225. 用队列实…

Linux 进程信号【信号产生】

&#x1f493;博主CSDN主页:麻辣韭菜&#x1f493;   ⏩专栏分类&#xff1a;Linux知识分享⏪   &#x1f69a;代码仓库:Linux代码练习&#x1f69a;   &#x1f339;关注我&#x1faf5;带你学习更多Linux知识   &#x1f51d; 目录 前言 信号概念 1. 生活角度的信号 2…

线性代数的一些理解(更新中)

以前学的时候都是囫囵吞枣&#xff0c;能搞过就得了。现在有了点时间可以静下来看看。。 还是分成点来看吧。 1 小车运行 一个车匀速在一维坐标前行&#xff0c;速度是2米每秒&#xff0c;起始点是0。如何描述 设 &#x1d465;(&#x1d461;) 表示车辆在时间 &#x1d461…

为什么要进行金融类软件测试

金融类软件测试是确保金融软件质量、安全性和稳定性的关键步骤。随着金融行业信息化和数字化的深入发展&#xff0c;金融软件的应用范围日益广泛&#xff0c;涉及资金交易、客户信息管理、风险控制等多个方面。 因此&#xff0c;进行金融类软件测试显得尤为重要&#xff0c;以…

力扣HOT100 - 155. 最小栈

解题思路&#xff1a; 辅助栈 class MinStack {private Stack<Integer> stack;private Stack<Integer> min_stack;public MinStack() {stack new Stack<>();min_stack new Stack<>();}public void push(int val) {stack.push(val);if (min_stack.i…