npu踩坑记录

之前使用qwen系列模型在ascend 910a卡进行了一些生成任务, 贴出踩坑过程也许对遇到类似问题的同学有帮助: )

千问 qwq32环境配置

代码部署

生成内容清洗

已生成内容清洗

生成过程优化

Failed to initialize the HCCP process问题

assistant 的历史回答丢失

推理执行失败

千问 qwq32环境配置

该模型为慢思考模型，思考过程可能没用，思考结果有用

创建开发环境使用qwq-32b镜像

根据训练作业可以找到模型位置, 启动代码

代码部署

更换模型千问 qwq32

生成内容清洗

content末尾的<|im_end|>处理

进行字符串处理

原先的处理代码:

cleaned_text = generated_text1.strip().replace("```markdown", "").replace("```", "")

现在通过正则匹配保留</think>之后, <|im_end|>之前的文字:

import re

def clean_text(generated_text1):

# 先去掉不需要的 markdown 标记

cleaned_text = generated_text1.strip().replace("```markdown", "").replace("```", "")

# 使用正则匹配 </think> 之后，<|im_end|> 之前的内容

match = re.search(r'</think>(.*?)<\|im_end\|>', cleaned_text, re.DOTALL)

if match:

return match.group(1).strip() # 提取内容并去除前后空格

else:

return "" # 如果没有匹配，返回空字符串

# 测试样例

generated_text1 = """

Hello, this is some text.

</think> ```markdown This is the part we want to keep. ```<|im_end|> More text here.

"""

cleaned_text = clean_text(generated_text1)

print(cleaned_text)

已生成内容清洗

对已经生成的数据使用clean_text进行清洗.

遍历文件夹中所有文件:

import os

def read_all_files(folder_path):

"""

读取指定文件夹内所有文件的内容并依次输出

:param folder_path: 文件夹路径

"""

if not os.path.isdir(folder_path):

print("错误：提供的路径不是一个有效的文件夹")

return

# 遍历文件夹中的所有文件

for filename in os.listdir(folder_path):

file_path = os.path.join(folder_path, filename)

# 仅处理普通文件（跳过文件夹）

if os.path.isfile(file_path):

try:

with open(file_path, 'r', encoding='utf-8') as f:

content = f.read()

print(f"===== 文件: {filename} =====")

print(content)

print("\n")

except Exception as e:

print(f"无法读取文件 {filename}: {e}")

# 指定要读取的文件夹路径

folder_path = "your_folder_path_here"

read_all_files(folder_path)

生成过程优化

生成中间结果知识点保留

分为outlines generation和contents generation, outlines generation保存生成的知识点, contents generation读取生成的知识点

Failed to initialize the HCCP process问题

Warmup拉不起来时候会报错:

EJ0001: [PID: 2369411] 2025-03-24-19:23:56.333.074 Failed to initialize the HCCP process.

Reason: Maybe the last training process is running.

Solution: Wait for 10s after killing the last training process and try again.

参考:

https://llamafactory.readthedocs.io/zh-cn/latest/advanced/npu.html

https://github.com/hiyouga/LLaMA-Factory/issues/3839

解决Failed to initialize the HCCP process问题

local_rank = int(os.environ["LOCAL_RANK"])

torch_npu.npu.set_device(local_rank)

pkill -9 python

杀掉device侧所有进程,等待10s后重新启动训练。

这些方法都用上,看到显存使用量终于降下来了,过了下再启动就好了

assistant 的历史回答丢失

现在成功读取后发现, 将知识点放入模板再输入模型进行推理的时候这些知识点消失了.

在使用qwen系列大模型qwq32B进行多轮对话时, 'role'为'assistent'的内容不包含在apply_chat_template后的text中, 解决这个问题

messages=[

{'role': 'assistent', 'content': outline},

{'role': 'user', 'content': prompt1}

]

text = tokenizer.apply_chat_template(

messages,

tokenize=False,

add_generation_prompt=True

)

qwq32B的主页为https://huggingface.co/Qwen/QwQ-32B

该代码使用ascend的is_chat_model模式运行, 是否会导致apply_chat_template()自动忽略 assistant 的历史回答

当 is_chat_model=True 时：

apply_chat_template() 可能会自动忽略 assistant 的历史回答，因为 Qwen 的 apply_chat_template 可能会：
- 仅格式化 user 提供的 messages，而不会把 assistant 的历史回答包含进去。
- 这就可能导致 assistant 的内容丢失，需要手动补全历史对话。

搜索关键词:

tokenizer.apply_chat_template

qwen

templates-for-chat-models:

https://huggingface.co/docs/transformers/main/chat_templating?template=Zephyr#templates-for-chat-models

模型可能具有几个不同的模板，用于不同的用例。例如，模型可能具有用于常规聊天，工具使用和rag的模板。

When there are multiple templates, the chat template is a dictionary. Each key corresponds to the name of a template. apply_chat_template handles multiple templates based on their name. It looks for a template named default in most cases and if it can’t find one, it raises an error.当有多个模板时，聊天模板是字典。每个键对应于模板的名称。 apply_chat_template根据其名称处理多个模板。在大多数情况下，它寻找一个名为default的模板

To access templates with other names, pass the template name to the chat_template parameter in apply_chat_template. For example, if you’re using a RAG template then set chat_template="rag".要使用其他名称访问模板，请将模板名称传递到chat_template <b1>> </b1>中的参数。例如，如果您使用的是rag模板，则设置chat_template="rag"。

设置模板:

tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"

查看和编辑模板:

https://huggingface.co/docs/transformers/main/chat_templating_writing

template = tokenizer.chat_template

template = template.replace("SYS", "SYSTEM") # Change the system token

tokenizer.chat_template = template # Set the new template

The template is saved in the tokenizer_config.json file. Upload it to the Hub with push_to_hub() so you can reuse it later and make sure everyone is using the right template for your model.模板保存在tokenizer_config.json文件中。

通过设置chat_template属性添加聊天模板，并使用apply_chat_template（）对其进行测试。如果它按预期工作，则可以使用push_to_hub（）上传到hub上，以便您稍后重复使用并确保每个人都使用正确的模板为模型。

tokenizer.push_to_hub("model_name")

下载tokenizer_config.json

这是qwen的模板与role为assistant相关部分:

"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- '' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" and not message.tool_calls %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n<think>\\n' }}\n{%- endif %}\n",

模板的含义

模板中与 assistant 角色相关的部分主要负责将模型过去生成的回答（以及可能的工具调用信息）以特定格式纳入对话上下文，以便后续生成时参考。具体来说，有两个分支：

纯回答的情况（无工具调用）：
当消息的 role 为 "assistant" 且没有 tool_calls 时，模板会对 message.content 做如下处理：
- 它会先用 message.content.split('</think>')[-1] 取出最后一个 </think> 标签后的内容，并去除开头的换行符（lstrip('\n')），这通常表示模型的“思考”部分已经结束，只保留最终的回答文本。
- 然后，这段回答会被包裹在 <|im_start|>assistant\n ... <|im_end|>\n 标签之间，从而作为完整的 assistant 消息嵌入整个对话历史中。
包含工具调用的情况：
当 assistant 消息中存在 tool_calls 时（即模型在回答过程中调用了外部函数），模板会进行如下操作：
- 同样先提取出回答部分（分割后取最后部分并去除前导换行），并输出 <|im_start|>assistant 后跟上这部分内容（如果存在）。
- 接着，对 message.tool_calls 进行循环处理。对于每个工具调用（如果 tool_call.function 定义了，则取其内容），模板会生成一个块：
  - 该块以 <tool_call> 开始，内部以 JSON 格式输出调用的函数名（"name"）和参数（"arguments"，使用 tojson 转换），最后以 </tool_call> 结束。
- 最终，再以 <|im_end|>\n 结束整个 assistant 消息的输出。

此外，在所有消息处理完后，如果设置了 add_generation_prompt，模板会追加 <|im_start|>assistant\n<think>\n

因此尝试在开头加入think结束的token, 看是否能获取到内容

加入后还是没有, debug发现他们把关键词拼错了, assistant拼成assistent了

修改后可以正常保留历史回复并进行生成