华为bug汇报:华为NPU竟成“遥遥领先”?

news2024/11/25 6:37:11

华为bug汇报:华为NPU竟成“遥遥领先”?

本文为我汇报在Ascend / pytorch 社区的一个bug,其中对NPU的实际算力进行了测试,并发现了华为NPU实际显存与销售宣传时存在着较大差差距的问题(算力问题见问题一、显存问题见问题二)。
究竟是遥遥领先,还是?

bug描述汇总

本机NPU为Atlas 300I Pro
本issue一共汇报两个问题,并在最后附上自己的环境信息,希望尽快寻得解释
问题一:推理速度慢且最终报错,只有1.20it/s
问题二:NPU显存计算与GPU似乎不同,原因为何?

问题1:

报错代码:

使用NPU进行fp16

import torch
import torch_npu
from accelerate import Accelerator
accelerator = Accelerator()
from accelerate import dispatch_model

device = accelerator.device

# source '/home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh'

x = torch.randn(2, 2).npu()
y = torch.randn(2, 2).npu()
z = x.mm(y)

print(z)
print(device)

# from modelscope import Model, AutoTokenizer


# model = Model.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1', device_map=device, torch_dtype=torch.float16)
# tokenizer = AutoTokenizer.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1')

# prompt = "Hey, are you conscious? Can you talk to me?"
# inputs = tokenizer(prompt, return_tensors="pt")

# # Generate
# generate_ids = model.generate(inputs.input_ids.to(model.device), max_length=30)
# print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
from modelscope import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
model_dir = snapshot_download("qwen/Qwen-7B-Chat", revision = 'v1.1.4')

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True).eval()

# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained(model_dir, trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参

# 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好!很高兴为你提供帮助。

# 第二轮对话 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。

# 第三轮对话 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业:一个年轻人的成功之路》

补充:我对代码进行了修改,使得能够区分报错发生在哪一步,具体控制台输出均在报错内容

import torch
import torch_npu
from accelerate import Accelerator
accelerator = Accelerator()
from accelerate import dispatch_model

device = accelerator.device

# source '/home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh'

x = torch.randn(2, 2).npu()
y = torch.randn(2, 2).npu()
z = x.mm(y)

print(z)
print(device)

# from modelscope import Model, AutoTokenizer


# model = Model.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1', device_map=device, torch_dtype=torch.float16)
# tokenizer = AutoTokenizer.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1')

# prompt = "Hey, are you conscious? Can you talk to me?"
# inputs = tokenizer(prompt, return_tensors="pt")

# # Generate
# generate_ids = model.generate(inputs.input_ids.to(model.device), max_length=30)
# print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
from modelscope import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
model_dir = snapshot_download("qwen/Qwen-7B-Chat", revision = 'v1.1.4')

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True).eval()

# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained(model_dir, trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
response, history = model.chat(tokenizer, input("请输入问题:"), history=None)
print(response)
while True:
    # 第二轮对话 2nd dialogue turn
    response, history = model.chat(tokenizer, input("请输入问题:"), history=history)
    print(response)

报错内容:

上述代码推理占用时间近似纯CPU推理,且第三轮对话报错,控制台输出如下:

(NPU) (base) [HwHiAiUser@bogon Code]$ /home/HwHiAiUser/下载/yes/envs/NPU/bin/python /home/HwHiAiUser/Code/main.py
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([[-1.8972, -0.0742],
        [-0.6470, -0.0174]], device='npu:0')
npu
2023-10-23 00:27:21,121 - modelscope - INFO - PyTorch version 2.1.0+cpu Found.
2023-10-23 00:27:21,122 - modelscope - INFO - Loading ast index from /home/HwHiAiUser/.cache/modelscope/ast_indexer
2023-10-23 00:27:21,141 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 068f7e60e6f05d224ec8ad9a969f5922 and a total number of 943 components indexed
2023-10-23 00:27:21,675 - modelscope - INFO - Use user-specified model revision: v1.1.4
/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/tiktoken/core.py:50: ResourceWarning: unclosed <ssl.SSLSocket fd=123, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.175.2', 57384), raddr=('39.101.130.40', 443)>
  self._core_bpe = _tiktoken.CoreBPE(mergeable_ranks, special_tokens, pat_str)
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|██████████████████████| 8/8 [00:06<00:00,  1.20it/s]
[W OpCommand.cpp:75] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy4096 (function operator())
/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/generation/logits_process.py:407: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/_internal/cpython-3.9.0/lib/python3.9/site-packages/torch/include/ATen/core/LegacyTypeDispatch.h:74.)
  sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
[W AddKernelNpu.cpp:86] Warning: The oprator of add is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
[W NeKernelNpu.cpp:28] Warning: The oprator of ne is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
你好!有什么我能为你做的吗?
好的,我给你讲一个年轻人奋斗创业最终取得成功的故事。这个故事叫做《奋斗》。

故事的主人公是一个叫做李明的年轻人,他出生在一个普通的家庭,但他有一个梦想,那就是成为一名企业家。他从小就对创业有着浓厚的兴趣,经常参加各种创业比赛,也曾经在大学期间创办过一家小型的创业公司。

然而,创业的道路并不容易,李明经历了许多挫折和困难。他的公司一度面临破产的危险,但他并没有放弃,而是更加努力地工作,寻找新的机会和资源。

最终,李明的努力得到了回报,他的公司开始慢慢发展起来,他也因此获得了许多荣誉和奖励。他的故事告诉我们,只要有梦想,有勇气,有毅力,就一定能够实现自己的创业梦想。
EZ9999: Inner Error!
EZ9999  Kernel task happen error, retCode=0x28, [aicpu timeout].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1574]
        TraceBack (most recent call last):
        rtStreamSynchronizeWithTimeout execute failed, reason=[aicpu timeout][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
        synchronize stream failed, runtime result = 507017[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]


DEVICE[0] PID[18400]: 
EXCEPTION STREAM:
  Exception info:TGID=18400, model id=65535, stream id=3, stream phase=3
  Message info[0]:RTS_HWTS: Aicpu timeout, slot_id=12, stream_id=3, task_id=6200
    Other info[0]:time=2023-10-23-00:50:30.892.993, function=process_hwts_timeout_exception, line=3745, error code=0x28
Traceback (most recent call last):
  File "/home/HwHiAiUser/Code/main.py", line 66, in <module>
    response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
  File "/home/HwHiAiUser/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1199, in chat
    outputs = self.generate(
  File "/home/HwHiAiUser/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1318, in generate
    return super().generate(
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/generation/utils.py", line 1652, in generate
    return self.sample(
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/generation/utils.py", line 2793, in sample
    if unfinished_sequences.max() == 0:
RuntimeError: ACL stream synchronize failed.
[W NPUStream.cpp:372] Warning: NPU warning, error code is 507017[Error]: 
[Error]: The aicpu execution times out. 
        Rectify the fault based on the error information in the log, or you can ask us at follwing gitee link by issues: https://gitee.com/ascend/pytorch/issue
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu timeout][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
EH9999  wait for compute device to finish failed, runtime result = 507017.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W NPUStream.cpp:372] Warning: NPU warning, error code is 507017[Error]: 
[Error]: The aicpu execution times out. 
        Rectify the fault based on the error information in the log, or you can ask us at follwing gitee link by issues: https://gitee.com/ascend/pytorch/issue
EH9999: Inner Error!
        rtDeviceSynchronize execute failed, reason=[aicpu timeout][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
EH9999  wait for compute device to finish failed, runtime result = 507017.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpaxnr032b'>
  _warnings.warn(warn_message, ResourceWarning)

修改后的代码:

(NPU) (base) [HwHiAiUser@bogon Code]$ /home/HwHiAiUser/下载/yes/envs/NPU/bin/python /home/HwHiAiUser/Code/main.py
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([[-0.3798,  0.5290],
        [-0.7580, -0.6727]], device='npu:0')
npu
2023-10-23 08:28:35,064 - modelscope - INFO - PyTorch version 2.1.0+cpu Found.
2023-10-23 08:28:35,065 - modelscope - INFO - Loading ast index from /home/HwHiAiUser/.cache/modelscope/ast_indexer
2023-10-23 08:28:35,084 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 068f7e60e6f05d224ec8ad9a969f5922 and a total number of 943 components indexed
2023-10-23 08:28:35,538 - modelscope - INFO - Use user-specified model revision: v1.1.4
/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/tiktoken/core.py:50: ResourceWarning: unclosed <ssl.SSLSocket fd=124, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.175.2', 58846), raddr=('39.101.130.40', 443)>
  self._core_bpe = _tiktoken.CoreBPE(mergeable_ranks, special_tokens, pat_str)
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|██████████████████████| 8/8 [00:05<00:00,  1.37it/s]
请输入问题:你好!你是谁
[W OpCommand.cpp:75] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy4096 (function operator())
/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/generation/logits_process.py:407: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/_internal/cpython-3.9.0/lib/python3.9/site-packages/torch/include/ATen/core/LegacyTypeDispatch.h:74.)
  sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
[W AddKernelNpu.cpp:86] Warning: The oprator of add is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
[W NeKernelNpu.cpp:28] Warning: The oprator of ne is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
我是通义千问,由阿里云开发的AI助手。我被设计用来回答各种问题、提供信息和与用户进行对话。有什么我可以帮助你的吗?
请输入问题:

问题二:

报错代码:
  import torch
import torch_npu
from accelerate import Accelerator
accelerator = Accelerator()
from accelerate import dispatch_model

device = accelerator.device

# source '/home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh'

x = torch.randn(2, 2).npu()
y = torch.randn(2, 2).npu()
z = x.mm(y)

print(z)
print(device)

# from modelscope import Model, AutoTokenizer


# model = Model.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1', device_map=device, torch_dtype=torch.float16)
# tokenizer = AutoTokenizer.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1')

# prompt = "Hey, are you conscious? Can you talk to me?"
# inputs = tokenizer(prompt, return_tensors="pt")

# # Generate
# generate_ids = model.generate(inputs.input_ids.to(model.device), max_length=30)
# print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
from modelscope import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
model_dir = snapshot_download("qwen/Qwen-7B-Chat", revision = 'v1.1.4')

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True).eval()

# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained(model_dir, trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参

# 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好!很高兴为你提供帮助。

# 第二轮对话 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。

# 第三轮对话 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业:一个年轻人的成功之路》

报错内容:

加载一个不量化全精度的7B模型进行推理,使用GPU加载占用的显存绝对不会超过20G,然而使用GPU却超显存了。
同时,Atlas 300I Pro在我购买时,标注的是24G显存,实际到手只有20G,我需要一个合理的解释。
以下是第一个代码段的执行结果与报错

(NPU) (base) [HwHiAiUser@bogon Code]$ /home/HwHiAiUser/下载/yes/envs/NPU/bin/python /home/HwHiAiUser/Code/main.py
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([[ 0.0766,  0.2028],
        [-2.3419, -1.6132]], device='npu:0')
npu
2023-10-23 07:56:42,901 - modelscope - INFO - PyTorch version 2.1.0+cpu Found.
2023-10-23 07:56:42,901 - modelscope - INFO - Loading ast index from /home/HwHiAiUser/.cache/modelscope/ast_indexer
2023-10-23 07:56:42,919 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 068f7e60e6f05d224ec8ad9a969f5922 and a total number of 943 components indexed
2023-10-23 07:56:43,931 - modelscope - INFO - Use user-specified model revision: v1.1.4
/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/tiktoken/core.py:50: ResourceWarning: unclosed <ssl.SSLSocket fd=124, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.175.2', 57986), raddr=('39.101.130.40', 443)>
  self._core_bpe = _tiktoken.CoreBPE(mergeable_ranks, special_tokens, pat_str)
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
Flash attention will be disabled because it does NOT support fp32.
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards:  62%|█████████████▊        | 5/8 [00:06<00:04,  1.38s/it]
Traceback (most recent call last):
  File "/home/HwHiAiUser/Code/main.py", line 45, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True).eval()
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/modelscope/utils/hf_util.py", line 181, in from_pretrained
    module_obj = module_class.from_pretrained(model_dir, *model_args,
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
    return model_class.from_pretrained(
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/modelscope/utils/hf_util.py", line 78, in from_pretrained
    return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3307, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3695, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 741, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
    new_value = value.to(device)
RuntimeError: NPU out of memory. Tried to allocate 66.00 MiB (NPU 0; 0 bytes total capacity; 19.09 GiB already allocated; 19.09 GiB current active; 0 bytes free; 19.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpuhsmwj2v'>
  _warnings.warn(warn_message, ResourceWarning)

环境信息

固件版本检查

(NPU) [HwHiAiUser@localhost ~]$ sudo /usr/local/Ascend/driver/tools/upgrade-tool --device_index -1 --component -1 --version
{
Get component version(6.4.12.1.241) succeed for deviceId(0), componentType(11).
	{"device_id":0, "component":hboot1a, "version":6.4.12.1.241}
Get component version(6.4.12.1.241) succeed for deviceId(0), componentType(12).
	{"device_id":0, "component":hboot1b, "version":6.4.12.1.241}
Get component version(6.4.12.1.241) succeed for deviceId(0), componentType(18).
	{"device_id":0, "component":hlink, "version":6.4.12.1.241}
}

npu-smi info

(NPU) [HwHiAiUser@localhost ~]$ npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 23.0.rc2                                 Version: 23.0.rc2                                     |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
| Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
+===============================+=================+======================================================+
| 8       310P3                 | OK              | NA           37                0     / 0             |
| 0       0                     | 0000:01:00.0    | 0            1700 / 21527                            |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Chip                  | Process id      | Process name             | Process memory(MB)        |
+===============================+=================+======================================================+
| No running processes found in NPU 8                                                                    |
+===============================+=================+======================================================+

CANN安装
已安装适应pytorch2.1.0版本的CANN7.0.RC1.alpha003,并且环境配置正确,如下代码运行正常:

import torch
import torch_npu

# source '/home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh'

x = torch.randn(2, 2).npu()
y = torch.randn(2, 2).npu()
z = x.mm(y)

print(z)

运行结果:

(NPU) (base) [HwHiAiUser@bogon Code]$ /home/HwHiAiUser/下载/yes/envs/NPU/bin/python /home/HwHiAiUser/Code/main.py
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([[ 0.0766,  0.2028],
        [-2.3419, -1.6132]], device='npu:0')

在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1123176.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

数据分析和机器学习的11个高级可视化图表介绍

可视化是一种强大的工具&#xff0c;用于以直观和可理解的方式传达复杂的数据模式和关系。它们在数据分析中发挥着至关重要的作用&#xff0c;提供了通常难以从原始数据或传统数字表示中辨别出来的见解。 可视化对于理解复杂的数据模式和关系至关重要&#xff0c;我们将介绍11…

【软考】12.3 质量管理/风险管理

《质量管理》 影响质量&#xff1a;范围、进度、成本质量规划 ——> 质量保证&#xff08;阶段性评审&#xff09; ——> 质量控制&#xff08;实时监控&#xff09; 质量特性 功能性、可靠性、可用性、效率、可维护性、可移植性 McCall质量模型 产品修正、产品转移、…

CPU和GPU有什么区别?

CPU&#xff1a;叫做中央处理器&#xff08;central processing unit&#xff09;作为计算机系统的运算和控制核心&#xff0c;是信息处理、程序运行的最终执行单元。 GPU&#xff1a;叫做图形处理器。图形处理器&#xff08;英语&#xff1a;Graphics Processing Unit&#x…

快来get策略模式,告别编程困惑,轻松变身编程高手✨

&#x1f3ac; 江城开朗的豌豆&#xff1a;个人主页 &#x1f525; 个人专栏 :《 VUE 》 《 javaScript 》 &#x1f4dd; 个人网站 :《 江城开朗的豌豆&#x1fadb; 》 ⛺️ 生活的理想&#xff0c;就是为了理想的生活 ! ​ 目录 ⭐ 专栏简介 &#x1f4d8; 文章引言 一…

【网络】网络编程套接字(二)

网络编程套接字 二 简单的TCP网络程序1、服务端创建套接字并绑定2、服务端监听2、服务端获取连接3、服务端处理请求4、客户端进行连接5、客户端发起通信6、通信测试 简单的TCP网络程序 TCP服务器创建套接字的做法与UDP服务器是基本一样的&#xff0c;但是TCP服务器会更加繁琐一…

独立产品灵感周刊 DecoHack #053 - 有意思的地图网站

本周刊记录有趣好玩的独立产品设计开发相关内容&#xff0c;每周发布&#xff0c;往期内容同样精彩&#xff0c;感兴趣的伙伴可以 点击订阅我的周刊。为保证每期都能收到&#xff0c;建议邮件订阅。欢迎通过 Twitter 私信推荐或投稿。 周刊继续发布 ❤️ &#x1f4bb; 产品推…

电脑屏幕模糊?这5个方法教你恢复清晰屏幕!

“我的电脑最近看着看着莫名就觉得好模糊&#xff0c;这到底是为什么呢&#xff1f;有什么方法可以解决电脑屏幕模糊的问题吗&#xff1f;” 使用电脑时&#xff0c;电脑屏幕是否清晰会很影响我们的使用体验感。如果电脑屏幕模糊&#xff0c;可能会给我们带来一种视觉上的不好体…

C#中的日期时间比较和格式化的方法

摘要&#xff1a;在C#中&#xff0c;日期和时间的比较以及格式化是常见的操作。了解如何正确比较和格式化日期和时间可以帮助我们更好地处理这些数据。本文将介绍C#中常用的日期时间比较方法&#xff08;CompareTo、Equals和比较运算符&#xff09;以及日期时间格式化方法&…

vue重修之路由【下】

文章目录 版权声明路由重定向、404&#xff0c;路由模式重定向404路由模式 声明式导航vue-routerrouter-link-active 和 router-link-exact-active定制router-link-active 和 router-link-exact-active跳转传参两种跳转传参总结 编程式导航两种语法路由传参path路径跳转传参nam…

Kafka3.x安装以及使用

一、Kafka下载 下载地址&#xff1a;https://kafka.apache.org/downloads 二、Kafka安装 因为选择下载的是 .zip 文件&#xff0c;直接跳过安装&#xff0c;一步到位。 选择在任一磁盘创建空文件夹&#xff08;不要使用中文路径&#xff09;&#xff0c;解压之后把文件夹内容剪…

10个最流行的开源机器视觉标注工具

推荐&#xff1a;用 NSDT编辑器 快速搭建可编程3D场景 我们知道寻找良好的图像标记和注释工具对于创建准确且有用的数据集的重要性。 随着图像注释空间的增长&#xff0c;我们看到开源工具的可用性激增&#xff0c;这些工具使任何人都可以免费标记他们的图像并从强大的功能中受…

这5种炫酷的动态图,都是用Python实现的!

数据可以帮助我们描述这个世界、阐释自己的想法和展示自己的成果&#xff0c;但如果只有单调乏味的文本和数字&#xff0c;我们却往往能难抓住观众的眼球。而很多时候&#xff0c;一张漂亮的可视化图表就足以胜过千言万语。本文将介绍 5 种基于 Plotly 的可视化方法&#xff0c…

IP地址SSL证书 IP证书

在许多企业用例中&#xff0c;公司需要SSL证书作为IP地址。公司使用IP地址通过Internet访问各种类型的应用程序。 公网IP地址的SSL证书&#xff1a; 内部IP&#xff08;也称为私有IP&#xff09;是IANA设置为保存的IPv4或IPv6地址&#xff0c;例如&#xff1a; RFC 1918范围内…

编译原理如何写出不带回溯的递归子程序?

递归子程序 使用不带回溯的递归子程序解析文法是预测性语法分析的基础&#xff0c;这通常需要该文法是LL(1)文法。每个非终结符对应一个递归子程序&#xff0c;并使用当前的输入符号和FIRST集合来决定调用哪个产生式。 让我们以一个简单的文法为例&#xff1a; 对于此文法&am…

大模型开发06:LangChain 概述

大模型开发06:LangChain 概述 LangChain 是一个基于语言模型开发应用程序的框架。它可以实现以下功能: 上下文感知: 将语言模型与上下文源相连接(提示词、示例、用于支撑响应的内容等)推理能力: 依赖语言模型进行推理(如何根据提供的上下文来回答问题或采取哪些行动等)La…

抖音手把手带,开放到月底!

这个月一直在做的两件事&#xff0c;一个是带内部社员&#xff0c;去把抖音项目跑通。一个是招募合伙人。简单说下这两个事&#xff0c;之前一直没在公众号说。 带学员这件事&#xff0c;默认收徒只到月底。感兴趣的直接报名&#xff0c;价格4980。这块无需多言&#xff0c;做一…

如何解散微信群?这两个方法收藏好!

微信群&#xff0c;简单来说就是多人社交&#xff0c;能够让用户与多个人进行交流与互动。群主可以邀请有共同爱好的朋友在一个群里聊天、分享信息等等&#xff0c;以此来增强社交互动。 如果是一些临时活动群或者群成员已经不活跃的情况下&#xff0c;那么群主可能会选择将群…

mysql图片存取初探

mysql数据库中使用blob存储使用base64加密图片数据 前言 这个方法并不好&#xff0c;因为传输的数据量还是蛮大的&#xff0c;可以存一些诸如头像的小图片&#xff0c;但是如果要存较大的图片会很慢。 不过只是课程作业中简单的功能&#xff0c;这样子简单又快捷&#xff0c;…

各类深度学习框架详解+深度学习训练环境搭建-GPU版本

目录 前言 一、深度学习框架 TensorFlow PyTorch Keras Caffe PaddlePaddle 二、深度学习框架环境搭建 1.CUDA部署 CUDA特性 CUDA下载 2.cuDNN cuDNN 的主要特性 cuDNN 下载 3.安装TensorFlow框架 TensorFlow 2 旧版 TensorFlow 1 4.安装PyTorch框架 5.安装Ca…

MySQL字段加密方案 安当加密

要通过安当KSP密钥管理系统实现MySQL数据库字段的加密&#xff0c;您可以按照以下步骤进行操作&#xff1a; 安装和配置安当KSP密钥管理系统&#xff1a;首先&#xff0c;您需要安装安当KSP密钥管理系统&#xff0c;并按照说明进行配置。确保您已经正确地设置了密钥管理系统的用…