昇腾910B部署Qwen2-7B-Instruct进行流式输出【pytorch框架】NPU推理

news2025/1/23 21:07:21

目录

  • 前情提要
    • torch_npu框架
    • mindsport框架
    • mindnlp框架
  • 下载模型
    • 国外
    • 国内
  • 环境设置
  • 代码适配(非流式)
    • Main
    • Branch
    • 结果展示
  • 代码适配(流式)

前情提要

torch_npu框架

官方未适配
在这里插入图片描述

mindsport框架

官方未适配
在这里插入图片描述

mindnlp框架

官方适配了,但是速度非常非常慢,10秒一个字
在这里插入图片描述

下载模型

国外

Hugging FaceHugging Face

国内

在这里插入图片描述modelscope

环境设置

pip install transformers==4.39.2
pip3 install torch==2.1.0
pip3 install torch-npu==2.1.0.post4
pip3 install accelerate==0.24.1
pip3 install transformers-stream-generator==0.0.5

代码适配(非流式)

Main

import torch
import torch_npu
import os
import platform
torch_device = "npu:1" # 0~7
torch.npu.set_device(torch.device(torch_device))
torch.npu.set_compile_mode(jit_compile=False)
option = {}
option["NPU_FUZZY_COMPILE_BLACKLIST"] = "Tril"
torch.npu.set_option(option)
from transformers import AutoModelForCausalLM, AutoTokenizer
# device = "cuda" # the device to load the model onto
DEFAULT_CKPT_PATH = '/root/.cache/modelscope/hub/qwen/Qwen2-7B-Instruct'
model = AutoModelForCausalLM.from_pretrained(
    DEFAULT_CKPT_PATH,
    torch_dtype=torch.float16,
    device_map=torch_device
).npu().eval()
tokenizer = AutoTokenizer.from_pretrained(DEFAULT_CKPT_PATH)
while True:
    prompt = input("user:")
    if prompt == "exit":
        break
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(torch_device)

    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=512
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print("Qwen2-7B-Instruct:",response)

Branch

找到自己虚拟环境

which python

我的是/root/anaconda3/envs/sakura/bin/python
找到/lib/python3.9/site-packages/transformers/generation/utils.py示例:

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py

找到第2708行,注释掉2708行~2712行
在2709行添加

next_token_scores = outputs.logits[:, -1, :]

示例:
在这里插入图片描述
出错就是在这里,如果进行了pre-process distribution,就会报错

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/logits_process.py:455: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
  sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
Traceback (most recent call last):
  File "/root/Qwen_test.py", line 63, in <module>
    generated_ids = model.generate(
  File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 1576, in generate
    result = self._sample(
  File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 2736, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: Sync:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158 NPU error, error code is 507018
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
E39999: Inner Error!
E39999: 2024-07-02-14:14:50.735.070  An exception occurred during AICPU execution, stream_id:23, task_id:2750, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:730]
        TraceBack (most recent call last):
        rtStreamSynchronizeWithTimeout execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]


DEVICE[1] PID[864803]:
EXCEPTION TASK:
  Exception info:TGID=864803, model id=65535, stream id=23, stream phase=SCHEDULE, task id=2750, task type=aicpu kernel, recently received task id=2750, recently send task id=2749, task phase=RUN
  Message info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210
    Other info[0]:time=2024-07-02-14:14:50.091.974, function=proc_aicpu_task_done, line=970, error code=0x2a
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999: 2024-07-02-14:14:50.743.702  Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1776]
        TraceBack (most recent call last):
        Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1579]
        Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1512]
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
        wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.745.695  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.747.300  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.814.377  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.816.023  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.817.628  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.819.236  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.820.843  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.822.422  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)

结果展示

最后运行Main文件
在这里插入图片描述

代码适配(流式)

未完待续

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1898725.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

第十六章 Qt的文件处理操作详解

目录 一、基本文件操作 二、二进制文件读写 三、文本文件读写 四、操作例子 1、QTextStream的流操作符 一、基本文件操作 文件操作是应用程序必不可少的部分。Qt 作为一个通用开发库,提供了跨平台的文件操作能力。在所有的 I/O 设备中,文件 I/O 是最重要的部分之…

生成式人工智能如何改变软件开发:助手还是取代者?

生成式人工智能如何改变软件开发&#xff1a;助手还是取代者&#xff1f; 生成式人工智能&#xff08;AIGC&#xff09;正在引领软件开发领域的技术变革。从代码生成、错误检测到自动化测试&#xff0c;AI工具在提高开发效率的同时&#xff0c;也引发了对开发者职业前景的讨论…

如何快速开展每日待办工作 待办任务高效管理

每天&#xff0c;我们都需要处理大量的待办工作&#xff0c;如何高效有序地开展这些工作成为了我们必须要面对的问题。仅仅依靠个人的记忆和脑力去管理这些繁杂的事务&#xff0c;显然是一项艰巨的挑战。在这个时候&#xff0c;如果能有一款实用的待办工具来辅助我们&#xff0…

深度神经网络语言识别

「AI秘籍」系列课程&#xff1a; 人工智能应用数学基础人工智能Python基础人工智能基础核心知识人工智能BI核心知识人工智能CV核心知识 使用 DNN 和字符 n-gram 对一段文本的语言进行分类&#xff08;附 Python 代码&#xff09; 资料来源&#xff0c;flaticon&#xff1a;htt…

惠海 H6225K 降压恒压芯片 支持12V24V36V48V60V转3.3V 5V车载仪器仪表方案

H6225K是一种内置60V耐压MOS&#xff0c;支持输入高达48V的高压降压开关控制器&#xff0c;可以向负载提供2.5A的连续电流。H6225K支持输出恒定电压&#xff0c;可以通过调节VFB采样电阻来设置输出电压&#xff0c;同时支持最大电流限制&#xff0c;可以通过修改CS采样电阻来设…

yolov8环境安装(可修改代码版本,源代码安装)

下载下来源文件以后&#xff0c;进去文件目录&#xff0c;然后输入pip指令&#xff0c;即可安装yolov8 cd ultralytics-main pip install -e . 直接使用pip安装的情况 当你使用pip install ultralytics这样的命令安装YOLOv8时&#xff0c;你实际上是在从Python包索引&#x…

HexPlane: A Fast Representation for Dynamic Scenes一种动态场景的快速表示方法

Abstract 动态三维场景的建模与再现是三维视觉领域的一个具有挑战性的课题。先前的方法基于 NERF 并依赖于隐式表示这是缓慢的&#xff0c;因为它需要许多 MLP 评估&#xff0c;限制真实世界的应用程序。我们展示了动态三维场景可以明确地表示为六个平面的学习功能&#xff0c…

[FFmpeg] windows下安装带gpu加速的ffmpeg

1.显卡能力排查 目前只有 NIVIDIA 支持 ffmpeg 的 gpu加速(AMD貌似也陆续开始支持)。 在下述网站中查找自己的显卡能够支持的编解码格式。https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-newhttps://developer.nvidia.com/video-encode-and-decod…

学习测试1

计算机基础 1、计算机范式&#xff1a;冯诺依曼机 2、存储单元 bit、byte、KB、MB、GB3、网络 ip、域名、ping 域名、 ipconfig测试工作的流程 ------------------------------------------------------------------------------------------- 一 编写测试大纲 罗列测试…

《昇思25天学习打卡营第10天|使用静态图加速》

文章目录 今日所学&#xff1a;一、背景介绍1. 动态图模式2. 静态图模式 三、静态图模式的使用场景四、静态图模式开启方式1. 基于装饰器的开启方式2. 基于context的开启方式 总结&#xff1a; 今日所学&#xff1a; 在上一集中&#xff0c;我学习了保存与加载的方法&#xff…

《数字图像处理-OpenCV/Python》第17章:图像的特征描述

《数字图像处理-OpenCV/Python》第17章&#xff1a;图像的特征描述 本书京东 优惠购书链接 https://item.jd.com/14098452.html 本书CSDN 独家连载专栏 https://blog.csdn.net/youcans/category_12418787.html 第17章&#xff1a;图像的特征描述 特征检测与匹配是计算机视觉的…

ASUS/华硕枪神4 G532L G732L系列 原厂win10系统 工厂文件 带F12 ASUS Recovery恢复

华硕工厂文件恢复系统 &#xff0c;安装结束后带隐藏分区&#xff0c;一键恢复&#xff0c;以及机器所有驱动软件。 系统版本&#xff1a;Windows10 原厂系统下载网址&#xff1a;http://www.bioxt.cn 需准备一个20G以上u盘进行恢复 请注意&#xff1a;仅支持以上型号专用…

植物大战僵尸融合版最新版1.0下载及安装教程

《植物大战僵尸融合版》最新版1.0已经发布&#xff0c;为粉丝们带来了全新的游戏体验。这个版本由B站UP主蓝飘飘fly精心打造&#xff0c;引入了创新的植物融合玩法&#xff0c;让玩家可以享受策略和创意的结合。以下是游戏的详细介绍和安装指南&#xff1a; 游戏特色介绍 全新…

建智慧医院核心:智能导航系统的功能全析与实现效益

在数字化转型的浪潮中&#xff0c;智慧医院的建设是医疗行业数字化转型的关键步骤。随着医院规模的不断扩大和医疗设施的日益复杂&#xff0c;传统的静态不连续的导航方式已无法满足患者的需求。院内智能导航系统&#xff0c;作为医疗数字化转型的关键组成部分&#xff0c;正逐…

【ABB】控制器语言切换

【ABB】控制器语言切换 操作流程演示 操作流程 点击【菜单】点击【Control Panel】点击【Language】点击【Chinese】点击【OK】此时会弹出弹窗&#xff0c;点击【YES】此时控制器会重启&#xff0c;重启完成就是中文了 演示 点击【菜单】 点击【Control Panel】 点击【Langua…

Vue3学习笔记(n.0)

vue指令之v-for 首先创建自定义组件&#xff08;practice5.vue&#xff09;&#xff1a; <!--* Author: RealRoad1083425287qq.com* Date: 2024-07-05 21:28:45* LastEditors: Mei* LastEditTime: 2024-07-05 21:35:40* FilePath: \Fighting\new_project_0705\my-vue-app\…

c++ 里如何检测内存泄露:比如用了 new ,但没有用 delete

&#xff08;1 方法一&#xff09; 用 MFC 框架的 F5 不带断点的调试。可以在输出窗口提示是否有内存泄露。 &#xff08;2 方法二&#xff09; &#xff0c;在 main 函数中添加如下代码&#xff0c;用 F5 不带断点的调试&#xff1a; int main() {_CrtSetDbgFlag( _CRTDBG_A…

strcpy,srtcmp,strlen函数漏洞利用

strcpy,srtcmp,strlen函数漏洞利用 strcpy strcpy函数用于将字符串复制到另一个指针指向的空间中&#xff0c;遇到空字符 **b’x\00’**时停止&#xff0c;&#xff1a; 所以可以利用 strcpy不检查缓冲区 的漏洞&#xff08;构造的字符串要以\0结尾&#xff09;&#xff0c;…

【雷丰阳-谷粒商城 】【分布式高级篇-微服务架构篇】【20】认证服务04—SSO单点登录

持续学习&持续更新中… 守破离 【雷丰阳-谷粒商城 】【分布式高级篇-微服务架构篇】【20】认证服务04—SSO单点登录 xxl-sso多系统-单点登录单点登录流程原理图单点登录流程简单实现参考 xxl-sso https://gitee.com/xuxueli0323/xxl-sso xxl-sso是开源的一个单点登录框架 …

hnust 1815: 算法10-6~10-8:快速排序

hnust 1815: 算法10-6~10-8&#xff1a;快速排序 题目描述 快速排序是对起泡排序的一种改进。它的基本思想是&#xff0c;通过一趟排序将待排序的记录分割成两个独立的部分&#xff0c;其中一部分记录的关键字均比另一部分的关键字小&#xff0c;在分成两个部分之后则可以分别…