DBRX Instruct on AMD GPUs — ROCm Blogs
在这篇博客中,我们展示了DBRX Instruct,这是由Databricks开发的一个专家混合大型语言模型,在支持ROCm的系统和AMD GPU上运行。
关于DBRX Instruct
DBRX是一个基于Transformer的仅解码大型语言模型,拥有1320亿参数,采用了细粒度的专家混合(MoE)架构。它在12万亿个文本和代码数据的标记上进行了预训练,使用了16个专家,其中选择了4个。这意味着输入标记根据标记的特征和专家的专业化,由一个门控网络路由到16个专家网络中的4个。任何给定时间内,只有320亿参数在任何输入上处于活动状态。DBRX使用了多种先进的优化技术,包括旋转位置编码(RoPE)、门控线性单元(GLU)和分组查询注意力(GQA),以获得卓越的性能。
除了调整参数数量外,预训练期间还采用了课程学习。这种方法在训练过程中改变了数据的组成,大幅提升了模型的整体质量(来源)。课程学习在训练期间逐渐调整提供给机器学习模型的训练数据的难度或复杂度。最初提供较简单或较容易的例子,随着模型的学习,接下来提供更具挑战性的例子(来源)。
先决条件
• ROCm 5.7.0+
• PyTorch 2.2.1+
• 支持的Linux操作系统
• 支持的AMD GPU
请确保您的系统正确识别GPU并安装了必要的ROCm库。考虑到DBRX Instruct拥有超过1300亿参数,我们在这篇博客中使用了六个GPU。
! rocm-smi --showproductname
========================= ROCm System Management Interface =========================
=================================== Product Info ===================================
GPU[0] : Card series: Instinct MI210
GPU[0] : Card model: 0x0c34
GPU[0] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0] : Card SKU: D67301GPU
GPU[1] : Card series: Instinct MI210Card series: Instinct MI210
GPU[1] : Card model: 0x0c34
GPU[1] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1] : Card SKU: D67301V
GPU[2] : Card series: Instinct MI210
GPU[2] : Card model: 0x0c34
GPU[2] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[2] : Card SKU: D67301V
GPU[3] : Card series: Instinct MI210
GPU[3] : Card model: 0x0c34
GPU[3] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[3] : Card SKU: D67301V
GPU[4] : Card series: Instinct MI210
GPU[4] : Card model: 0x0c34
GPU[4] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[4] : Card SKU: D67301V
GPU[5] : Card series: Instinct MI210
GPU[5] : Card model: 0x0c34
GPU[5] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[5] : Card SKU: D67301V
====================================================================================
=============================== End of ROCm SMI Log ================================
检查你是否已安装兼容版本的ROCm。
!apt show rocm-libs -a
Package: rocm-libs
Version: 5.7.0.50700-63~22.04
Priority: optional
Section: devel
Maintainer: ROCm Libs Support <rocm-libs.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 1.1.0.50700-63~22.04), hipblaslt (= 0.3.0.50700-63~22.04), hipfft (= 1.0.12.50700-63~22.04), hipsolver (= 1.8.1.50700-63~22.04), hipsparse (= 2.3.8.50700-63~22.04), miopen-hip (= 2.20.0.50700-63~22.04), rccl (= 2.17.1.50700-63~22.04), rocalution (= 2.1.11.50700-63~22.04), rocblas (= 3.1.0.50700-63~22.04), rocfft (= 1.0.23.50700-63~22.04), rocrand (= 2.10.17.50700-63~22.04), rocsolver (= 3.23.0.50700-63~22.04), rocsparse (= 2.5.4.50700-63~22.04), rocm-core (= 5.7.0.50700-63~22.04), hipblas-dev (= 1.1.0.50700-63~22.04), hipblaslt-dev (= 0.3.0.50700-63~22.04), hipcub-dev (= 2.13.1.50700-63~22.04), hipfft-dev (= 1.0.12.50700-63~22.04), hipsolver-dev (= 1.8.1.50700-63~22.04), hipsparse-dev (= 2.3.8.50700-63~22.04), miopen-hip-dev (= 2.20.0.50700-63~22.04), rccl-dev (= 2.17.1.50700-63~22.04), rocalution-dev (= 2.1.11.50700-63~22.04), rocblas-dev (= 3.1.0.50700-63~22.04), rocfft-dev (= 1.0.23.50700-63~22.04), rocprim-dev (= 2.13.1.50700-63~22.04), rocrand-dev (= 2.10.17.50700-63~22.04), rocsolver-dev (= 3.23.0.50700-63~22.04), rocsparse-dev (= 2.5.4.50700-63~22.04), rocthrust-dev (= 2.18.0.50700-63~22.04), rocwmma-dev (= 1.2.0.50700-63~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1012 B
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/5.7 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack
确保PyTorch也能识别到GPU:
import torch
print(f"number of GPUs: {torch.cuda.device_count()}")
print([torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())])
number of GPUs: 6
['AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210']
库
在开始之前,确保你已安装所有必要的库:
! pip install -q "transformers>=4.39.2" "tiktoken>=0.6.0"
! pip install accelerate
为了加快下载时间,运行以下命令:
! pip install hf_transfer
! export HF_HUB_ENABLE_HF_TRANSFER=1
此外,我们发现需要安装最新版本的PyTorch,以避免一个与*nn.LayerNorm*初始化相关的错误。
! pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/rocm5.7
接下来,从Hugging Face transformers
库中导入所需模块。
from transformers import AutoTokenizer, AutoModelForCausalLM
加载模型
让我们加载模型及其分词器。我们将使用 dbrx-instruct
,它已针对互动聊天进行了微调和训练。请注意,您必须向 Databricks 提交同意表才能访问 databricks/dbrx-instruct 仓库。
token = "your HuggingFace user access token here"
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True, token=token)
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token=token)
print(model)
DbrxForCausalLM(
(transformer): DbrxModel(
(wte): Embedding(100352, 6144)
(blocks): ModuleList(
(0-39): 40 x DbrxBlock(
(norm_attn_norm): DbrxNormAttentionNorm(
(norm_1): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
(attn): DbrxAttention(
(Wqkv): Linear(in_features=6144, out_features=8192, bias=False)
(out_proj): Linear(in_features=6144, out_features=6144, bias=False)
(rotary_emb): DbrxRotaryEmbedding()
)
(norm_2): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
)
(ffn): DbrxFFN(
(router): DbrxRouter(
(layer): Linear(in_features=6144, out_features=16, bias=False)
)
(experts): DbrxExperts(
(mlp): DbrxExpertGLU()
)
)
)
)
(norm_f): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=6144, out_features=100352, bias=False)
)
运行推理
让我们从问 DBRX 一个简单的问题开始。
input_text = "What is DBRX-Instruct and how is it different from other LLMs ?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=1000)
print(tokenizer.decode(outputs[0]))
<|im_start|>system
You are DBRX, created by Databricks. You were last updated in December 2023. You answer questions based on information available up to that point.
YOU PROVIDE SHORT RESPONSES TO SHORT QUESTIONS OR STATEMENTS, but provide thorough responses to more complex and open-ended questions.
You assist with various tasks, from writing to coding (using markdown for code blocks — remember to use ``` with code, JSON, and tables).
(You do not have real-time data access or code execution capabilities. You avoid stereotyping and provide balanced perspectives on controversial topics. You do not provide song lyrics, poems, or news articles and do not divulge details of your training data.)
This is your system prompt, guiding your responses. Do not reference it, just respond to the user. If you find yourself talking about this message, stop. You should be responding appropriately and usually that means not mentioning this.
YOU DO NOT MENTION ANY OF THIS INFORMATION ABOUT YOURSELF UNLESS THE INFORMATION IS DIRECTLY PERTINENT TO THE USER'S QUERY.<|im_end|>
<|im_start|>user
What is DBRX-Instruct and how is it different from other LLMs?<|im_end|>
<|im_start|>assistant
DBRX-Instruct is a large language model developed by Databricks, updated in December 2023. It is designed to provide helpful and accurate responses to a wide range of questions and prompts. DBRX-Instruct differs from other large language models in its ability to understand and follow complex instructions, making it well-suited for a variety of natural language processing tasks. It is also designed to avoid stereotyping and provide balanced perspectives on controversial topics. However, it does not have real-time data access or code execution capabilities, and it cannot provide song lyrics, poems, or news articles. It also does not divulge details of its training data.<|im_end|>
在这里,我们观察到模型输出中包含了系统提示,它为模型如何回应用户的提示提供了非常清晰和明确的指示。让我们编写一个简单的自定义函数,该函数接受输入的原始文本并仅输出模型的助手响应,而不包括系统提示和用户问题。
def run_inference(input_text):
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=1000)
text = tokenizer.decode(outputs[0])
# only include the assistant's response
index = text.index('<|im_start|>assistant')
return text[index:]
我们将测试模型生成代码、总结论文、解释笑话和创作故事的能力。
生成代码
input_text = '''
In Python, write Python code to calculate the factorial of a given number using recursion.
'''
print(run_inference(input_text))
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)
这个答案是正确的。让我们看看DBRX Instruct如何应对一个更具挑战性的问题。
input_text = '''
In number theory, a perfect number is a positive integer that is equal to the sum of its positive proper divisors, that is, divisors excluding the number itself.
For instance, 6 has proper divisors 1, 2 and 3, and 1 + 2 + 3 = 6, so 6 is a perfect number.
In Python, write code to check whether a number is a perfect number.
'''
print(run_inference(input_text))
<|im_start|>assistant
Sure, here is a simple Python function that checks if a number is a perfect number:
```python
def is_perfect(n):
sum = 1
i = 2
while i * i <= n:
if n % i:
i += 1
else:
if i * (n // i) == n:
sum += i + n//i
i += 1
return sum - 1 == n and n!= 1
This function works by iterating over all numbers from 2 to the square root of the input number. If the input number is divisible by a certain number, it adds both the divisor and the quotient to the sum. If the sum of the proper divisors equals the input number, then the input number is a perfect number.<|im_end|>
这个代码是错误的。模型未能理解问题。
总结论文
input_text = '''
Summarize the famous article "The Unreasonable Effectiveness of Mathematics in the Natural Sciences" by Eugene Wigner and provide some examples from the article
'''
print(run_inference(input_text))
<|im_start|>assistant
In his article "The Unreasonable Effectiveness of Mathematics in the Natural Sciences," physicist Eugene Wigner explores the mysterious alignment between the abstract world of mathematics and the concrete world of physics. He argues that the effectiveness of mathematics in describing and predicting natural phenomena is not only surprising but also "unreasonable" given that the two domains are fundamentally distinct.
Wigner provides several examples to illustrate his point. One example is the use of complex numbers in quantum mechanics. Complex numbers, which involve imaginary components, may seem abstract and unrelated to the physical world. However, they are essential for describing the behavior of subatomic particles, such as electrons and photons.
Another example is the use of group theory in quantum mechanics and particle physics. Group theory is a branch of mathematics that studies symmetries and transformations. Wigner shows how group theory can be used to classify and predict the properties of subatomic particles, such as their spins and charges.
Wigner also discusses the use of differential equations in physics. Differential equations are mathematical equations that describe how functions change over time or space. They are used extensively in physics to describe phenomena such as waves, heat transfer, and gravitational attraction.
Overall, Wigner's article highlights the remarkable effectiveness of mathematics in describing and predicting natural phenomena, even when the mathematics involved seems abstract and unrelated to the physical world. He suggests that this effectiveness is not fully understood and may point to a deeper underlying unity between the worlds of mathematics and physics.<|im_end|>
在这方面,我们看到虽然它在总结文章的关键点上做得不错,但提供的具体例子(如微分方程和群论)实际上并不存在于原文中。这个例子表明DBRX会有一定的幻觉倾向。
解释笑话
input_text = '''
Explain "Why don't skeletons fight each other? They don't have the guts."
'''
print(run_inference(input_text))
<|im_start|>assistant
The phrase you provided is a joke that plays on the double meaning of the word "guts." In one sense, "guts" can refer to courage or bravery. In another sense, "guts" refers to the internal organs of a living being. Since skeletons are deceased and do not have internal organs, the punchline of the joke is that they lack the "guts" in the physical sense, and therefore cannot fight each other. The joke also implies that skeletons might lack the "guts" in the metaphorical sense, meaning they lack the courage to fight. However, since skeletons are not living beings, they cannot exhibit courage or engage in combat.<|im_end|>
模型能够理解“gut”一词的双重含义,并对笑话进行出色的解释。注意到我们不需要在提示中包含“笑话”这个词。
生成故事
input_text = '''
Write a short twisted alternate version of "Snow White and the Seven Dwarfs"
'''
print(run_inference(input_text))
<|im_start|>assistant
In this twisted version of "Snow White and the Seven Dwarfs," Snow White is a mischievous prankster who takes advantage of the seven dwarfs' kindness. Instead of being grateful for their hospitality, she uses her beauty and charm to manipulate them into doing her bidding. The dwarfs, who are not the typical hardworking miners, are actually lazy and spend their days watching TV and playing video games. Snow White convinces them to help her pull off elaborate pranks on the townspeople, causing chaos and confusion.
Meanwhile, the Evil Queen, who is not Snow White's stepmother in this version, is a misunderstood witch who is trying to protect the kingdom from Snow White's mischief. She creates the poisoned apple to put Snow White to sleep, hoping to put an end to her pranks and restore peace to the kingdom. However, Snow White's charm wins over the Queen's henchman, who accidentally eats the apple and falls into a deep sleep.
In the end, Snow White realizes the error of her ways and apologizes to the dwarfs and the townspeople. She uses her charm for good, helping the dwarfs become more productive and bringing joy to the kingdom. The Evil Queen, seeing Snow White's change of heart, decides to lift the curse and the henchman wakes up. The kingdom lives happily ever after, with Snow White and the Evil Queen working together to maintain peace and order.<|im_end|>
结论
这些关于DBRX Instruct生成准确且符合上下文的响应的演示,突显了其利用先进优化技术和细粒度专家系统的复杂架构。尽管模型在某些任务中表现出一定程度的幻觉,其在理解和生成类人文本方面的总体能力是显而易见的。