深度学习:大模型Decoding+MindSpore NLP分布式推理详解

news2025/1/24 1:09:50

大模型推理流程

1. 用户输入提示词(Prompt)

假设用户输入为:“从前,有一只小猫,它喜欢……”

我们的目标是让模型生成一段完整的故事。

2. 模型处理用户输入

2.1 分词:输入提示被分词为模型可以理解的子词(subword)或单词(token)。

例如:

"从前,有一只小猫,它喜欢……" 可能被分词为:

["从前", ",", "有", "一只", "小猫", ",", "它", "喜欢", "……"]

这些 token 会被映射为模型词汇表中的索引(ID)。也就是Tokenizer分词器返回的input_ids

2.2 将IDs转为embeddings

每个 token 被转换为一个高维向量(embedding),这些向量包含了语义信息。模型通过嵌入层将 token 索引映射为向量。

用户输入的input_ids形状为:(1, 9),表示batch中有一个样本,样本序列长度为9。

嵌入层(Embedding Layer)将每个 token 索引映射为一个高维向量。这个向量的维度是 hidden_size,即模型的隐藏层维度。hidden_size为模型的超参数,由设计者决定。

经过嵌入层后,输入的形状会从 (batch_size, seq_length) 变为 (batch_size, seq_length, hidden_size)。例如,(1, 9) 会变为 (1, 9, hidden_size)

2.3 对张量加入位置编码

为了保留输入序列的顺序信息,模型会为每个 token 添加位置编码。这些编码与 token 嵌入相加,形成最终的输入表示。

位置编码(Positional Encoding) 的张量维度大小与 输入嵌入(Input Embedding) 的维度大小完全相同,并且它们会直接在最后一个维度上相加。

  • 输入嵌入的形状
    输入嵌入的输出形状是 (batch_size, seq_length, hidden_size),其中 hidden_size 是每个 token 的嵌入维度。

  • 位置编码的形状
    位置编码的形状也是 (batch_size, seq_length, hidden_size),与输入嵌入的形状完全一致。

位置编码可以保留语句的顺序信息,直接将位置信息注入语句中。

3. 前向传播

将处理过的用户输入张量输入模型进行前向计算。

4. 生成输出

在自回归生成任务中,模型会逐步生成 token,每次生成一个 token。因此,输出结果的形状会随着生成过程而变化。

  • 输入形状:(1, 9)

  • 模型输出的概率分布形状:(1, 9, vocab_size)

  • 生成下一个 token 的形状:(1, 1)

4.1 输出概率分布

最后一层 Transformer 的输出会通过一个线性层和 softmax 函数,生成每个可能 token 的概率分布。例如,模型可能会预测下一个 token 是“玩耍”的概率为 0.4,“睡觉”的概率为 0.3,等等。

4.2 解码策略(Decoding Strategy)

模型根据概率分布选择下一个 token。常见的解码策略包括:

  • 贪婪搜索(Greedy Search)
    选择概率最高的 token。例如,选择“玩耍”作为下一个 token。

  • 束搜索(Beam Search)
    保留多个候选序列,选择整体概率最高的序列。

  • 采样(Sampling)
    根据概率分布随机采样下一个 token。

输出的概率分布 和 随机采样的概率分布 之间有直接的联系!随机采样是基于模型输出的概率分布进行的,因此两者密切相关。

  • 随机采样的基础
    随机采样直接依赖于模型输出的概率分布。概率分布决定了每个 token 被采样的可能性。

  • 概率分布的作用
    概率分布反映了模型对每个 token 的“信心”或“偏好”。高概率的 token 更有可能被采样,而低概率的 token 也有可能被采样到(尤其是在多样性较高的场景中)。

  • 采样结果的不确定性
    由于采样是随机的,即使概率分布相同,每次采样的结果也可能不同。这与贪婪搜索(总是选择最高概率的 token)形成对比。

Top-K和Top-P策略可以与温度Temperature结合使用。

5. 迭代生成

5.1 递归生成

模型将生成的 token 重新作为输入,继续生成下一个 token。例如:

  1. 输入提示:“从前,有一只小猫,它喜欢……”

  2. 模型生成:“玩耍”

  3. 新输入:“从前,有一只小猫,它喜欢玩耍”

  4. 模型继续生成:“,每天……”

生成过程会持续,直到达到最大生成长度或生成特殊的终止 token(如 <EOS>)。

6. 最终输出

最终,模型生成的完整故事可能是:
“从前,有一只小猫,它喜欢玩耍,每天都会在花园里追逐蝴蝶。有一天,它遇到了一只小鸟……”

LLM模型不是直接使用贪心解码策略(选择概率最高的token作为输出),如果使用贪心解码册啰,对于相同输入序列LLM模型每次都会给出相同回复(推理模式下参数固定,不存在随机性)。所以, 

不同的大模型解码策略

假设模型正在预测“The cat”的下一个token,模型预测结果如下:

• sat  (0.5)

• jumped  (0.3)

• is  (0.1)

• slept  (0.05)

• runs  (0.05)

1. Top-k 采样

Top-k 采样将随机性引入解码过程,通过限制输出token的集合在Top-k个概率最高的token。下一个输出的token将在Top-k个token中随机采样生成。

在案例中,Top-k 采样会选出概率最高的sat(0.5)和jumped(0.3),随后从这两个token中随机采样出下一个预测的token作为模型的输出。

2. Top-p 采样

Top-p 采样首先通过设置一个限制值P,随后按照概率大小选取n个token,直至token累计的概率达到P。随后对n个token进行随机采样。

在案例中,Top-p 采样回选出sat(0.5),jumped(0.3)和is(0.1),随后对这三个token进行随机采样出下一个token。

3. 温度采样

温度Temperature,作为一个超参数,可以控制选择token的概率分布。预测的概率分布会被因子 1/T进行缩放,T则是温度。

  • 当T = 1时,概率分布不发生变化。
  • 当T > 1时,模型输出变得更为随机,小概率的token更容易出现。
  • 当T < 1时,模型输出变得更有确定性,高概率的token更容易得到选择。

温度高时,模型会变得“更有创造性”;温度较低时,模型变得“更加精准”。

4. 束搜索

束搜索是更加精密的贪心搜索策略,它会保留top-k个序列同时进行扩展。

  • 在每一步,模型生成 top-k 个最可能的词汇,并继续解码每一个 k 个序列。
  • 参数 beam width(k)决定了每一步保留多少个候选序列。
  • 在每一步之后,模型根据累积概率对 k 个序列进行排序,并保留概率最高的 k 个序列用于进一步扩展。

在案例中,假设beam的数量为2。那么我们将会选出概率最高的2个token用于后续生成。

“The cat sat”(累计概率:0.5)

“The cat jumped”(累计概率:0.3)

模型继续扩充两个序列,如:

“The cat sat on the mat”

“The cat jumped over the fence”

Beam-Search后续发展有Diverse Beam-Search

不同解码策略的使用场景

  • 贪婪解码(Greedy Decoding
  • 当需要快速生成文本且对生成质量要求不是特别高时,贪婪解码是一个简单且计算效率高的选择。它选择 具有最大logit值的token作为下一个输出,适用于需要快速响应的场景,如聊天机器人的初步响应生成。
  • 束搜索(Beam Search):
  • 适用于需要精确控制输出质量的场景,如机器翻译或问答系统。束搜索通过考虑多个候选序列来生成文本, 可以提高翻译的准确性和流畅性。
  • 抽样解码(Sampling Decoding
  • 适用于需要多样性输出的场景,如创意写作或开放性问题的回答。抽样解码从词汇表中根据概率分布选择 token,可以通过调整参数如温度(Temperature)来控制随机性。
  • Top-KTop-P
  • 适用于需要控制输出长度和提高生成质量的场景。Top-KTop-P通过限制候选token的数量来提高生 成的连贯性和减少重复,适用于需要高质量输出的任务。
  • 温度采样(Temperature Sampling
  • 适用于需要在生成过程中增加随机性的场景,如创意写作或探索性任务。温度参数可以调整输出的随 机度,较低的温度值会使采样更接近确定性解码,而较高的温度值则增加随机性。

MindSpore进行解码推理

创建Notebook

mindspore==2.3.0, cann==8.0

更新mindspore

pip install --upgrade mindspore

克隆mindnlp

git clone https://github.com/mindspore-lab/mindnlp.git

 更新mindnlp

cd mindnlp
bash scripts/build_and_reinstall.sh

卸载mindformers

pip uninstall mindformers

加载模型与转换输入

import mindspore
from mindnlp.transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "LLM-Research/Meta-Llama-3-8B-Instruct"
# 下载Llama 3的分词器
tokenizer = AutoTokenizer.from_pretrained(model_id, mirror="modelscope")

# 下载Llama 3模型
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    ms_dtype=mindspore.float16,
    mirror="modelscope"
)

# 输入信息
messages = [
    {"role": "system", "content": "You are a psychological counsellor, who is good at emotional comfort"},
    {"role": "user", "content": "I don't sleep well for a long time."}
]
# 将输入信息转为input_ids
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="ms"
)
# 声明预测的终止token
terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
# 模型生产结果
outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=50, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    do_sample=True, # 是否对输出进行概率分布采样
    top_p=1.0 # 声明top-p值
)

贪心策略

# 贪心策略
# 模型生产结果
outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=1000, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    do_sample=False, # 是否对输出进行概率分布采样
)

response = outputs[0][input_ids.shape[-1]:]
tokenizer.decode(response, skip_special_tokens=True)

模型输出:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? What's been keeping you awake at night? Is it stress, anxiety, or something else?\n\nAlso, have you noticed any patterns or triggers that might be contributing to your insomnia? For example, do you find yourself lying awake for hours, or do you wake up multiple times during the night?\n\nRemember, I'm here to listen and support you, and I want you to feel comfortable sharing as much or as little as you'd like."

重复多次模型输出结果未发生变化 

Temperature参数

temperature控制文本生成的随机性和多样性,控制输出张量的概率分布。

import mindspore
from mindspore import Tensor
import numpy as np
import mindspore.ops as ops

logits = Tensor(np.array([[0.5, 1.2, -1.0, 0.1]]), mindspore.float32)

probs = ops.softmax(logits, axis=-1)
# low temp = 0.5
# 分布更为集中(陡峭)
probs_low = ops.softmax(logits / 0.5, axis=-1)
# high temp = 2
# 分布更为分散(平缓)
probs_high = ops.softmax(logits / 2, axis=-1)

probs, probs_low, probs_high
(Tensor(shape=[1, 4], dtype=Float32, value=
 [[ 2.55937576e-01,  5.15394986e-01,  5.71073927e-02,  1.71560094e-01]]),
 Tensor(shape=[1, 4], dtype=Float32, value=
 [[ 1.80040166e-01,  7.30098903e-01,  8.96367151e-03,  8.08972642e-02]]),
 Tensor(shape=[1, 4], dtype=Float32, value=
 [[ 2.69529819e-01,  3.82481009e-01,  1.27316862e-01,  2.20672339e-01]]))

可以看出温度越高,分布越平缓,温度越低,分布越集中

temerature=1

# 模型生产结果
outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=1000, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    do_sample=True, # 是否对输出进行概率分布采样
    temperature=1
)
# 标准温度输出
response = outputs[0][input_ids.shape[-1]:]
tokenizer.decode(response, skip_special_tokens=True)

输出1:

"I'm so sorry to hear that you're struggling with sleep. It can be really tough to deal with insomnia or disrupted sleep patterns. Can you tell me a bit more about what's been going on? What's been on your mind lately that might be keeping you awake? Has anything changed in your life that could be contributing to this difficulty?"

输出2:

"I'm so sorry to hear that you're struggling with sleep. It can be such a frustrating and debilitating experience. Can you tell me a bit more about what's been going on for you? What's been making it hard for you to fall asleep or stay asleep? Is it racing thoughts, stress, anxiety, or something else?\n\nAlso, how long have you been experiencing this sleep difficulty? Has it been a recent development or has it been going on for a while?"

temperature=0.1

输出1:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? What are some of the things that make it hard for you to fall asleep or stay asleep? Is it stress, anxiety, physical discomfort, or something else?\n\nAlso, have you noticed any patterns or triggers that seem to make it worse? For example, do you tend to have trouble sleeping on certain nights of the week, or after certain events or activities?\n\nRemember, I'm here to listen and support you, and I want you to feel comfortable sharing as much or as little as you'd like."

输出2:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? What's been on your mind lately, and how have you been feeling when you wake up in the morning?"

输出3:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? What's been on your mind lately that might be making it hard for you to fall asleep or stay asleep?"

temperature=2

# 模型生产结果
outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=1000, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    do_sample=True, # 是否对输出进行概率分布采样
    temperature=2.0
)
# 高温度输出->概率分布更为分散
response = outputs[0][input_ids.shape[-1]:]
tokenizer.decode(response, skip_special_tokens=True)

输出1:

"I'm so sorry to hear that. Not getting proper sleep can be really wearing on your emotional and physical well-being. Can you tell me a little bit more about how this lack of sleep is affecting you? Are you feeling constantly exhausted, irritable, or struggling to concentrate? Have you noticed any changes in your relationships or daily routine because of it?\n\nMost importantly, I'm here for you, and I believe that by exploring this together, we can find ways to improve your sleep and improve your overall well-being.\n\nIt might be helpful for me to share that sometimes, lack of sleep can be a sign of underlying anxiety, stress, or even unprocessed emotions. If we can identify the root cause, I may have some suggestions on how to ease your path to better sleep.\n\nWould you like me to offer you some coping strategies to help you relax and unwind before bedtime? Sometimes, a simple change in routine or relaxation techniques can make a world of difference."

 输出2:

"It can be really frustrating and worrying when sleep evade you, making it hard to wake up feeling refreshed and energized. I'm listening, and I want you to know that I'm here to support you. It's important to recognize that this is a tough and normal experience, even if it can be tough to bear right now.\n\nWould you like to talk more about what's going on when you have trouble sleeping? Is there anything in particular that bothers you or stress you out?"

输出3:

"It can be really distressing to deal with chronic sleep issues, not getting the rest you need and feeling tired and exhausted all the time. Can you tell me a little bit more about how you've been feeling? Have you noticed any patterns or triggers that might be contributing to the issue? And how has it been affecting other aspects of your life?\n\nAlso, I want you to know that as your listener, my main goal right now is just to support and provide comfort. Whatever you share, I'm here for you. No judgments, no critiques, just a gentle and compassionate space for you to express yourself.\n\nRemember, it takes a lot of courage to share vulnerable thoughts and feelings with someone like me, and I want to assure you that your feelings are completely normal and valid. Okay?"

 束搜索(Beam Search)

# Beam Search 束搜索
beam_outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=100, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    num_return_sequences=5,
    early_stopping=True,
)

for i, beam_output in enumerate(beam_outputs):
    print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))
输出1:
0: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect so many areas of your life. Can you tell me a bit more about what's been going on? When did you start having trouble sleeping, and what do you usually do when you try to go to bed? Do you find yourself lying awake for hours, or do you fall asleep but then wake up multiple times throughout the night?

输出2:
1: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

I'm so sorry to hear that you're struggling with sleep. It can be such a frustrating and unsettling experience, affecting not just your physical health but also your emotional well-being.

First of all, please know that you're not alone in this struggle. Many people face difficulties with sleep from time to time, and it's not uncommon for it to be a persistent issue for some. It's okay to acknowledge that you're struggling, and it takes a lot of courage to reach out and talk about

输出3:
2: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

I'm so sorry to hear that you're struggling with sleep. It can be such a frustrating and isolating experience. It's like your body is refusing to cooperate with your mind, and it can leave you feeling exhausted, irritable, and just plain miserable.

First of all, let me just acknowledge that it's okay to not be okay. It's okay to struggle with sleep, and it's okay to feel overwhelmed and stuck. I'm here to listen, and I want you to know

输出4:
3: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? What are some common things that happen when you try to fall asleep, and what do you usually do when you wake up during the night?

输出5:
4: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

I'm so sorry to hear that you're struggling with sleep. It can be such a frustrating and exhausting experience. Can you tell me a bit more about what's been going on for you? When did you first start noticing that your sleep was affected, and what do you think might be contributing to it? Is it stress, anxiety, or something else entirely?

Remember, everything we discuss is completely confidential and a safe space for you to express yourself. I'm here to listen and support you,
​​​​​
# Beam Search 束搜索
beam_outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=100, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    num_return_sequences=5,
    early_stopping=True,
    no_repeat_ngram_size=2 # 设置此参数可以避免多句之间存在重复词组
)

for i, beam_output in enumerate(beam_outputs):
    print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))
输出1:
0: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

It can be really tough to deal with sleepless nights, and it's completely normal to feel frustrated, anxious, or even a bit hopeless. Can you tell me a little bit more about what's been going on? What's making it hard for you to fall asleep, do you think? Is it stress, worries, physical discomfort, something else, a combination of things?

Also, how have you been coping with the lack of sleep? Have you noticed any changes in your daily life, mood

输出2:
1: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

Sweetheart, I'm so sorry to hear that you're struggling with sleep. It can be really tough to deal with, both physically and emotionally. Can you tell me a bit more about what's been going on for you? What's making it hard foryou to fall asleep or stay asleep? Is it stress, anxiety, or something else entirely?

Remember, everything we discuss is completely confidential and a safe space for us to explore your feelings. I want you to know that I believe in your

输出3:
2: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

Sweetheart, I'm so sorry to hear that you're struggling with sleep. It can be such a frustrating and exhausting experience. Can you tell me a bit more about what's been going on? When did you start noticing that your sleep was affected, and what are some of the things that make it hard for you to fall asleep or stay asleep?

输出4:
3: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

It sounds like you're struggling with insomnia or difficulty sleeping, and that's really tough. Not getting enough sleep can affect so many aspects of our lives, from our mood to our energy levels to even our physical health.

First of all, I want you to know that you don’t have to go through this alone. I'm here to listen and support you. Can you tell me a bit more about what's been going on? When did you start noticing trouble sleeping? Is it a sudden change

输出5:
4: system

You are a psychological counsellor, who is good at emotional comfortuser

I don't sleep well for a long time.assistant

Sweetheart, I'm so sorry to hear that you're struggling with sleep. It can be such a frustrating and debilitating experience, feeling like you can't get a good night's rest. Can you tell me a bit more about what's been going on for you? When did you first start noticing that your sleep was affected, and what are some of the things that keep you awake at night?

Top-K 采样

# top-k采样
outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=100, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    do_sample=True,
    top_k=5,
    num_return_sequences=3
)

for i, output in enumerate(outputs):
    # response = output[0][input_ids.shape[-1]:]
    print("{}: {}".format(i+1, tokenizer.decode(response, skip_special_tokens=True)))

k=5,只选取前5个概率最高的值进行采样,结果会缺乏创意性。

输出1:
"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? When did you start having trouble sleeping, and what's been making it hard for you to fall asleep or stay asleep? Is it stress, anxiety, physical discomfort, or something else entirely?\n\nAlso, how have you been feeling during the day? Are you feeling tired, irritable, or just"

  输出2:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect so many aspects of your daily life. Can you tell me a bit more about what's been going on? What are some of the things that are making it hard for you to fall asleep or stay asleep? Is it stress, anxiety, or something else entirely?\n\nAlso, have you noticed any patterns or triggers that seem to make it worse? For example, do you tend to have trouble sleeping during"

 输出3:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect so many areas of your life. Can you tell me a bit more about what's been going on? Have you noticed any patterns or triggers that might be contributing to your insomnia? And how have you been feeling during the day when you're not getting a good night's sleep?"

k = 1000,采样具有随机性

# top-k采样
outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=100, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    do_sample=True,
    top_k=1000
)

response = outputs[0][input_ids.shape[-1]:]
tokenizer.decode(response, skip_special_tokens=True)

输出1:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many areas of your life. Can you tell me a bit more about what's been going on? What's been keeping you awake at night? Is it stress, anxiety, or something else entirely?\n\nRemember, I'm here to listen and offer support. I'm not here to judge or try to fix the problem right away. Just talking about it can sometimes help you feel a bit better.\n\nAlso"

输出2:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? Is it just difficulty falling asleep, or are you having trouble staying asleep or experiencing restless nights? And have you noticed any patterns or triggers that might be contributing to your sleep issues?\n\nAlso, I want you to know that it's completely normal to struggle with sleep from time to time, and it"

输出3:

"I'm so sorry to hear that you're struggling with sleep. It can be such a frustrating and exhausting experience. Can you tell me a bit more about what's been going on? What's been keeping you awake at night? Is it stress, anxiety, or something else entirely?\n\nRemember, everything we discuss is confidential and a safe space for you to share your feelings. I'm here to listen and offer support.\n\nAlso, I want you to know that you're not alone in this struggle."

Top-P 采样 

# top-p采样
outputs = model.generate(
    input_ids, # 输入token
    max_new_tokens=100, # 限制输出长度
    eos_token_id=terminators, # 声明终止符
    do_sample=True,
    top_p=0.5
)

response = outputs[0][input_ids.shape[-1]:]
tokenizer.decode(response, skip_special_tokens=True)

p = 0.5 

输出1:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? What are some of the things that make it hard for you to fall asleep or stay asleep? Is it stress, anxiety, physical discomfort, or something else?\n\nAlso, have you noticed any patterns or triggers that seem to make it worse? For example, do you tend to have trouble sleeping on"

输出2:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? What are some of the things that are making it hard for you to fall asleep or stay asleep? Is it stress, anxiety, physical discomfort, or something else?\n\nAlso, have you noticed any patterns or triggers that seem to make it worse? For example, do you tend to have trouble sleeping"

输出3:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? What are some of the things that make it hard for you to fall asleep or stay asleep? Is it stress, anxiety, physical discomfort, or something else?\n\nAlso, have you noticed any patterns or triggers that seem to make it worse? For example, do you tend to have trouble sleeping on"

p = 0.95

输出1:

"I'm so sorry to hear that you're struggling with sleep. It can be such a frustrating and exhausting experience. Can you tell me a bit more about what's been going on? When did you first start noticing that you weren't sleeping well, and what are some of the things that you've tried to help you get a good night's rest?\n\nIt's also important to acknowledge that it's okay to not be okay. It takes a lot of courage to admit when we're struggling, and"

输出2:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect many aspects of your daily life. Can you tell me a bit more about what's been going on? Have you noticed any patterns or triggers that might be contributing to your insomnia?"

输出3:

"I'm so sorry to hear that you're struggling with sleep. It can be really frustrating and affect so many aspects of your daily life. Can you tell me a bit more about what's been going on? What are some of the things that are making it hard for you to fall asleep or stay asleep? Is it stress, anxiety, or something else?"

MindNLP并行推理——多进程多卡

示例代码地址:

https://github.com/mindspore-lab/mindnlp/tree/master/llm/inference/llama3

仓库中的 readme 文件说明了多卡推理的使用方法

注意:选择modelarts中贵阳区域的镜像:mindspore==2.3.0 cann==8.0.0 启动后不需要升级mindspore版本,否则hccl通信算子库将无法兼容。

推荐:使用msrun命令

msrun是mindspore定义的一个多进程并行命令,使用该命令可以获得最佳性能。

msrun --worker_num=2 --local_worker_num=2 --master_port=8118 --join=True run_llama3_distributed.py
# 具体数量根据你有多少张卡进行执行
  1. --worker_num=2:

    指定总共有 2 个工作节点(worker)参与任务。这些工作节点可以是不同的机器或不同的进程。
  2. --local_worker_num=2:

    指定在当前机器上启动 2 个工作节点。这意味着在当前机器上会有 2 个进程参与任务。
  3. --master_port=8118:

    指定主节点(master)的端口号为 8118。主节点负责协调各个工作节点的通信和任务分配。
  4. --join=True:

    表示工作节点在启动后会加入主节点的任务。通常用于确保所有工作节点都连接到主节点并准备好执行任务。

run_llama3_distributed.py文件具体如下:

# 导入 MindSpore 框架,用于深度学习任务
import mindspore

# 从 MindSpore 的通信模块中导入 init 函数,用于初始化分布式训练环境
from mindspore.communication import init

# 从 MindNLp 库中导入 AutoTokenizer 和 AutoModelForCausalLM 类,用于加载预训练模型和分词器
from mindnlp.transformers import AutoTokenizer, AutoModelForCausalLM

# 定义模型 ID,这里使用的是 Meta-Llama-3-8B-Instruct 模型
model_id = "LLM-Research/Meta-Llama-3-8B-Instruct"

# 初始化分布式训练环境,确保多机多卡之间的通信正常
init()

# 使用 AutoTokenizer 从预训练模型加载分词器
# mirror='modelscope' 指定从 ModelScope 平台下载模型
tokenizer = AutoTokenizer.from_pretrained(model_id, mirror='modelscope')

# 使用 AutoModelForCausalLM 从预训练模型加载语言模型
# ms_dtype=mindspore.float16 指定模型使用半精度浮点数(float16)进行计算
# mirror='modelscope' 指定从 ModelScope 平台下载模型
# device_map="auto" 自动分配模型到可用设备(如 GPU 或 CPU)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    ms_dtype=mindspore.float16,
    mirror='modelscope',
    device_map="auto"
)

# 定义对话消息列表,包含系统提示和用户输入
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

# 使用分词器将对话消息转换为模型输入的张量
# add_generation_prompt=True 添加生成提示,确保模型知道需要生成回复
# return_tensors="ms" 返回 MindSpore 格式的张量
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="ms"
)

# 定义终止符列表,用于告诉模型何时停止生成文本
# 包括结束符(eos_token_id)和自定义的终止符(<|eot_id|>)
terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

# 使用模型生成文本
# input_ids 是输入的张量
# max_new_tokens=100 限制生成的最大 token 数量为 100
# eos_token_id=terminators 指定终止符列表
# do_sample=True 启用采样策略,而不是贪婪解码
# temperature=0.6 控制生成文本的随机性,值越低越确定
# top_p=0.9 使用核采样(nucleus sampling),保留概率质量最高的 90% 的 token
outputs = model.generate(
    input_ids,
    max_new_tokens=100,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

# 从生成的输出中提取模型生成的文本部分
# outputs[0] 是生成的完整序列,input_ids.shape[-1] 是输入的长度
# 通过切片操作获取生成的部分
response = outputs[0][input_ids.shape[-1]:]

# 使用分词器将生成的 token 解码为可读的文本
# skip_special_tokens=True 跳过特殊 token(如终止符)
print(tokenizer.decode(response, skip_special_tokens=True))

同时,也可以使用mpirun命令

mpirun -n 2 python run_llama3_distributed.py

关于mindspore的组网方式具体可以参考:

分布式并行启动方式 — MindSpore master 文档

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2281148.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Redis 集群模式入门

Redis 集群模式入门 一、简介 Redis 有三种集群模式&#xff1a;主从模式、Sentinel 哨兵模式、cluster 分片模式 主从复制&#xff08;Master-Slave Replication&#xff09;: 在这种模式下&#xff0c;数据可以从一个 Redis 实例&#xff08;主节点 Master&#xff09;复…

查看电脑或笔记本CPU的核心数方法及CPU详细信息

一、通过任务管理器查看 1.打开任务管理器 可以按下“Ctrl Shift Esc”组合键&#xff0c;或者按下“Ctrl Alt Delete”组合键后选择“任务管理器”来打开。 2.查看CPU信息 在任务管理器界面中&#xff0c;点击“性能”标签页&#xff0c;找到CPU使用记录区域&#xff0c…

光学遥感显著性目标检测2023-2024论文学习

GRSL 2023&#xff1a; Attention-Aware Three-Branch Network for Salient Object Detection in Remote Sensing Images 基于encoder-decoder框架&#xff0c;提出了一系列缝合模块&#xff0c;GCA&#xff0c;FDUC&#xff0c;MSDC&#xff0c;RA。 GRSL 2023&#xff1a;OR…

Kubernetes 集群中安装和配置 Kubernetes Dashboard

前言 上篇成功部署Kubernetes集群后&#xff0c;为了方便管理和监控集群资源&#xff0c;安装Kubernetes Dashboard显得尤为重要。Kubernetes Dashboard 是一个通用的、基于 Web 的 UI&#xff0c;旨在让用户轻松地部署容器化应用到 Kubernetes 集群&#xff0c;并对这些应用进…

2025西湖论剑-babytrace

前言 就做了下题目&#xff0c;pwn1/3 都是签到&#xff0c;pwn2 后面绕 ptrace 有点意思&#xff0c;简单记录一下 漏洞分析 子进程中的读/写功能没有检查负数的情况&#xff0c;存在越界读写&#xff1a; void __fastcall get_value(__int64 *int64_arr) {__int64 ll; //…

HarmonyOS Next 应用UI生成工具介绍

背景 HarmonyOS Next适配开发过程中难买难要参考之前逻辑&#xff0c;但是可能时间较长文档不全&#xff0c;只能参考Android或iOS代码&#xff0c;有些逻辑较重的场景还可以通过AI工具将Android 的Java代码逻辑转成TS完成部分复用。对于一些UI场景只能手动去写&#xff0c;虽…

计算机网络 (56)交互式音频/视频

一、定义与特点 定义&#xff1a;交互式音频/视频是指用户使用互联网和其他人进行实时交互式通信的技术&#xff0c;包括语音、视频图像等多媒体实时通信。 特点&#xff1a; 实时性&#xff1a;音频和视频数据是实时传输和播放的&#xff0c;用户之间可以进行即时的交流。交互…

【深度学习】Java DL4J 2024年度技术总结

&#x1f9d1; 博主简介&#xff1a;CSDN博客专家&#xff0c;历代文学网&#xff08;PC端可以访问&#xff1a;https://literature.sinhy.com/#/?__c1000&#xff0c;移动端可微信小程序搜索“历代文学”&#xff09;总架构师&#xff0c;15年工作经验&#xff0c;精通Java编…

考研408笔记之数据结构(三)——串

数据结构&#xff08;三&#xff09;——串 1. 串的定义和基本操作 本节内容很少&#xff0c;重点是串的模式匹配&#xff0c;所以对于串的定义和基本操作&#xff0c;我就简单提一些易错点。另外&#xff0c;串也是一种特殊的线性表&#xff0c;只不过线性表是可以存储任何东…

Spring Data JPA使用基础教程

文章目录 Spring Data JPA使用基础教程一、引言二、环境搭建1、添加依赖2、配置数据库 三、核心组件1、实体类2、Repository 接口 四、使用示例1、基本操作1.1、保存数据1.2、查询数据1.3、更新数据1.4、删除数据 2、自定义查询 五、最佳实践1. **合理使用懒加载与急加载**2. *…

到华为考场考HCIE的注意事项和考试流程

大家好&#xff0c;我是张同学&#xff0c;来自成都职业技术学院2021级计算机网络专业。最近成功通过了 Datacom HCIE 考试&#xff0c;在这里和大家分享一下我的经验。 考证契机 在母校的培养下&#xff0c;我接触到ICT这个行业&#xff0c;打好了基础&#xff0c;开始了成…

STM32 ST7735 128*160

ST7735 接口和 STM32 SPI 引脚连接 ST7735 引脚功能描述STM32 引脚连接&#xff08;示例&#xff0c;使用 SPI1&#xff09;SCLSPI 时钟信号 (SCK)PA0(SPI1_SCK)SDASPI 数据信号 (MOSI)PA1 (SPI1_MOSI)RST复位信号 (Reset)PA2(GPIO 手动控制)DC数据/命令选择 (D/C)PA3 (GPIO 手…

大华相机DH-IPC-HFW3237M支持的ONVIF协议

使用libONVIF C库。 先发现相机。 配置 lib目录 包含 编译提示缺的文件&#xff0c;到libonvif里面拷贝过来。 改UDP端口 代码 使用msvc 2022的向导生成空项目&#xff0c;从项目的main示例拷贝过来。 CameraOnvif.h #pragma once#include <QObject> #include &l…

JavaWeb过滤器和监听器实现网页计数功能

过滤器用于在请求到达Servlet之前或响应返回给客户端之前对请求或响应进行预处理或后处理操作&#xff0c;监听器用于监听Web应用中的事件。 实现网页计数功能。要完成两项计数&#xff1a; 第一&#xff0c;本网页历史访问人数&#xff1b; 第二&#xff0c;本站当前在线用户…

AIGC视频生成明星——Emu Video模型

大家好&#xff0c;这里是好评笔记&#xff0c;公主号&#xff1a;Goodnote&#xff0c;专栏文章私信限时Free。本文详细介绍Meta的视频生成模型Emu Video&#xff0c;作为Meta发布的第二款视频生成模型&#xff0c;在视频生成领域发挥关键作用。 &#x1f33a;优质专栏回顾&am…

Genetic Prompt Search via Exploiting Language Model Probabilities

题目 利用语言模型概率的遗传提示搜索 论文地址&#xff1a;https://www.ijcai.org/proceedings/2023/0588.pdf 项目地址&#xff1a;https://github.com/zjjhit/gap3 摘要 针对大规模预训练语言模型(PLMs)的即时调优已经显示出显著的潜力&#xff0c;尤其是在诸如fewshot学习…

NavVis手持激光扫描帮助舍弗勒快速打造“数字孪生”工厂-沪敖3D

在全球拥有近100家工厂的舍弗勒&#xff0c;从2016年开启数字化运营进程&#xff0c;而当前制造、库存、劳动力和物流的数字化&#xff0c;已无法支持其进一步简化工作流程&#xff0c;亟需数字化物理制造环境&#xff0c;打造“数字孪生”工厂。 NavVis为其提供NavVis VLX 3…

最新-CentOS 7安装1 Panel Linux 服务器运维管理面板

CentOS 7安装1 Panel Linux 服务器运维管理面板 一、前言二、环境要求三、在线安装四、离线安装1.点击下面1 Panel官网链接访问下载&#xff0c;如未登录或注册&#xff0c;请登录/注册后下载2.使用将离线安装包上传至目标终端/tem目录下3.进入到/tem目录下解压离线安装包4.执行…

vscode环境中用仓颉语言开发时调出覆盖率的方法

在vscode中仓颉语言想得到在idea中利用junit和jacoco的覆盖率&#xff0c;需要如下几个步骤&#xff1a; 1.在vscode中搭建仓颉语言开发环境&#xff1b; 2.在源代码中右键运行[cangjie]coverage. 思路1&#xff1a;编写了测试代码的情况&#xff08;包管理工具&#xff09; …

MySQL、HBase、ES的特点和区别

MySQL&#xff1a;关系型数据库&#xff0c;主要面向OLTP&#xff0c;支持事务&#xff0c;支持二级索引&#xff0c;支持sql&#xff0c;支持主从、Group Replication架构模型&#xff08;本文全部以Innodb为例&#xff0c;不涉及别的存储引擎&#xff09;。 HBase&#xff1…