Sebastian Raschka 最新博客:从头开始用 Llama 2 构建 Llama 3.2

news2025/1/27 12:59:11

最近已有不少大厂都在秋招宣讲了,也有一些在 Offer 发放阶段。

节前,我们邀请了一些互联网大厂朋友、今年参加社招和校招面试的同学。

针对新手如何入门算法岗、该如何准备面试攻略、面试常考点、大模型技术趋势、算法项目落地经验分享等热门话题进行了深入的讨论。

总结链接如下:

  • 《大模型面试宝典》(2024版) 正式发布

喜欢本文记得收藏、关注、点赞


近日,机器学习研究员 Sebastian Raschka 光速发布长篇教程《Converting Llama 2 to Llama 3.2 From Scratch》。
图片

  • 博文链接:https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb

本文是《 Converting a From-Scratch GPT Architecture to Llama 2》的后续,更新的内容是如何将 Meta 的 Llama 2 架构模型逐步转换为 Llama 3、Llama 3.1 和 Llama 3.2。为了避免不必要的冗长,本文特意将解释部分缩至最短,并将重点放在主代码上。

图片

机器之心对文章内容进行了不改变原意的编译:

1 逐步转换 Llama 模型实现

如果你是初次实施 LLM 架构,建议从《Build a Large Language Model From Scratch》(https://github.com/rasbt/LLMs-from-scratch/blob/0972ded5309c25dc5eecc98b62897d677c6c36c4/ch04/01_main-chapter-code/ch04.ipynb)的第 4 章开始,那部分内容将逐步指导你实施原始 GPT 架构。

然后可参考《Converting a From-Scratch GPT Architecture to Llama 2》(https://github.com/rasbt/LLMs-from-scratch/blob/0972ded5309c25dc5eecc98b62897d677c6c36c4/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb),将实现 Llama 特有的组件,如 RMSNorm 层、SiLU 和 SwiGLU 激活、RoPE(旋转位置嵌入)和 SentencePiece tokenizer。

本笔记本采用 Llama 2 架构,并通过以下方式将其转换为 Llama 3 架构:

  • 修改旋转嵌入

  • 实现分组查询注意力

  • 使用定制版的 GPT-4 tokenizer

随后,我们将 Meta 共享的原始 Llama 3 权重加载到架构中:

1.1 复用 Llama 2 的组件

Llama 2 实际上与 Llama 3 非常相似,如上文所述和本文开头的图片所示。

这意味着我们可以使用以下代码从 Llama 2 笔记本中导入多个构建模块:

import os
import sys
import io
import nbformat
import types
def import_from_notebook():
def import_definitions_from_notebook(fullname, names):
current_dir = os.getcwd()
path = os.path.join(current_dir, fullname + ".ipynb")
path = os.path.normpath(path)
# Load the notebook
if not os.path.exists(path):
raise FileNotFoundError(f"Notebook file not found at: {path}")
with io.open(path, "r", encoding="utf-8") as f:
nb = nbformat.read(f, as_version=4)
# Create a module to store the imported functions and classes
mod = types.ModuleType(fullname)
sys.modules[fullname] = mod
# Go through the notebook cells and only execute function or class definitions
for cell in nb.cells:
if cell.cell_type == "code":
cell_code = cell.source
for name in names:
# Check for function or class definitions
if f"def {name}" in cell_code or f"class {name}" in cell_code:
exec(cell_code, mod.__dict__)
return mod
fullname = "converting-gpt-to-llama2"
names = ["precompute_rope_params", "compute_rope", "SiLU", "FeedForward", "RMSNorm", "MultiHeadAttention"]
return import_definitions_from_notebook(fullname, names)
imported_module = import_from_notebook()
# We need to redefine precompute_rope_params
# precompute_rope_params = getattr(imported_module, "precompute_rope_params", None)
compute_rope = getattr(imported_module, "compute_rope", None)
SiLU = getattr(imported_module, "SiLU", None)
FeedForward = getattr(imported_module, "FeedForward", None)
RMSNorm = getattr(imported_module, "RMSNorm", None)
# MultiHeadAttention only for comparison purposes
MultiHeadAttention = getattr(imported_module, "MultiHeadAttention", None)

1.2 修改后的 RoPE

Llama 3 使用的 RoPE 与 Llama 2 相似,可参阅 RoPE 论文(https://arxiv.org/abs/2104.09864)。

不过,二者 RoPE 设置有一些细微差别。Llama 3 现在支持多达 8192 个 token,是 Llama 2(4096)的两倍。

RoPE 的基础值(见下文公式),从 10000(Llama 2)增加到 50000(Llama 3),公式如下(改编自 RoPE 论文):

图片

这些值是一组预定义的参数,用于确定旋转矩阵中的旋转角度,其中的维数是嵌入空间的维数。

将基数从 10000 增加到 50000,频率(或旋转角度)在各维度上的衰减速度会更慢,这意味着维度越高,角度越大(本质上,这是对频率的解压缩)。

此外,我们还在下面的代码中引入了一个 freq_config 部分,用于调整频率;不过,在 Llama 3(只有 Llama 3.1 和 Llama 3.2)中并不需要它,所以稍后会重新访问这个 freq_config(默认设置为「无」并被忽略)。

import torch
def precompute_rope_params(head_dim, theta_base=10000, context_length=4096, freq_config=None):
assert head_dim % 2 == 0, "Embedding dimension must be even"
# Compute the inverse frequencies
inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim // 2) / (head_dim // 2)))
################################ NEW ###############################################
# Frequency adjustments
if freq_config is not None:
low_freq_wavelen = freq_config["original_context_length"] / freq_config["low_freq_factor"]
high_freq_wavelen = freq_config["original_context_length"] / freq_config["high_freq_factor"]
wavelen = 2 * torch.pi / inv_freq
inv_freq_llama = torch.where(
wavelen > low_freq_wavelen, inv_freq / freq_config["factor"], inv_freq
)
smooth_factor = (freq_config["original_context_length"] / wavelen - freq_config["low_freq_factor"]) / (
freq_config["high_freq_factor"] - freq_config["low_freq_factor"]
)
smoothed_inv_freq = (
(1 - smooth_factor) * (inv_freq / freq_config["factor"]) + smooth_factor * inv_freq
)
is_medium_freq = (wavelen <= low_freq_wavelen) & (wavelen >= high_freq_wavelen)
inv_freq_llama = torch.where(is_medium_freq, smoothed_inv_freq, inv_freq_llama)
inv_freq = inv_freq_llama
####################################################################################
# Generate position indices
positions = torch.arange(context_length)
# Compute the angles
angles = positions[:, None] * inv_freq[None, :]  # Shape: (context_length, head_dim // 2)
# Expand angles to match the head_dim
angles = torch.cat([angles, angles], dim=1)  # Shape: (context_length, head_dim)
# Precompute sine and cosine
cos = torch.cos(angles)
sin = torch.sin(angles)
return cos, sin

总之,与 Llama 2 相比,Llama 3 的新功能是「上下文长度」和 theta 基底参数:

# Instantiate RoPE parameters
llama_2_context_len = 4096
llama_3_context_len = 8192
llama_2_theta_base = 10_000
llama_3_theta_base = 50_000

在 Llama 2 中,用法与以前相同:

# Settings
batch_size = 2
num_heads = 4
head_dim = 16
# Instantiate RoPE parameters
cos, sin = precompute_rope_params(
head_dim=head_dim,
theta_base=llama_3_theta_base,
context_length=llama_3_context_len
)
# Dummy query and key tensors
torch.manual_seed(123)
queries = torch.randn(batch_size, llama_3_context_len, num_heads, head_dim)
keys = torch.randn(batch_size, llama_3_context_len, num_heads, head_dim)
# Apply rotary position embeddings
queries_rot = compute_rope(queries, cos, sin)
keys_rot = compute_rope(keys, cos, sin)

1.3 分组查询注意力

本节将用一种名为分组查询注意力(GQA)的替代机制来取代多头注意力(MHA)。简而言之,可以将 GQA 视为计算和参数效率更高的 MHA 版本。

在 GQA 中,通过在多个注意力头之间共享来减少键和值投影的数量,每个注意力头仍有其独特的查询,但这些查询关注同一组键和值。

下面是具有 2 个 key-value 组的 GQA 示例:

图片

GQA 的主要思想是减少与键值对相关的唯一查询组的数量,从而在不显著降低建模性能的情况下,减少 MHA 中某些矩阵乘法的大小和参数的数量。

简而言之,GQA 的主要变化是每个查询组都需要重复,以匹配与之相关的头数量,具体实现如下:

import torch.nn as nn

class GroupedQueryAttention(nn.Module):
    def __init__(
            self, d_in, d_out, context_length, num_heads,
            num_kv_groups,       # NEW
            rope_base=10_000,    # NEW
            rope_config=None,    # NEW
            dtype=None
        ):
        super().__init__()
        assert d_out % num_heads == 0, "d_out must be divisible by num_heads"
        assert num_heads % num_kv_groups == 0, "num_heads must be divisible by num_kv_groups"

        self.d_out = d_out
        self.num_heads = num_heads
        self.head_dim = d_out // num_heads

        ############################# NEW  #############################
        # self.W_key = nn.Linear(d_in, d_out, bias=False, dtype=dtype)
        # self.W_value = nn.Linear(d_in, d_out, bias=False, dtype=dtype)
        self.W_key = nn.Linear(d_in, num_kv_groups * self.head_dim, bias=False, dtype=dtype)
        self.W_value = nn.Linear(d_in, num_kv_groups * self.head_dim, bias=False, dtype=dtype)
        self.num_kv_groups = num_kv_groups
        self.group_size = num_heads // num_kv_groups
        ################################################################

        self.W_query = nn.Linear(d_in, d_out, bias=False, dtype=dtype)
        self.out_proj = nn.Linear(d_out, d_out, bias=False, dtype=dtype)

        self.register_buffer("mask", torch.triu(torch.ones(context_length, context_length), diagonal=1))
        cos, sin = precompute_rope_params(
            head_dim=self.head_dim,
            theta_base=rope_base,      # NEW
            freq_config=rope_config,   # NEW
            context_length=8192
        )
        self.register_buffer("cos", cos)
        self.register_buffer("sin", sin)

    def forward(self, x):
        b, num_tokens, d_in = x.shape

        queries = self.W_query(x)  # Shape: (b, num_tokens, d_out)
        keys = self.W_key(x)  # Shape: (b, num_tokens, num_kv_groups * head_dim)
        values = self.W_value(x)  # Shape: (b, num_tokens, num_kv_groups * head_dim)

        # Reshape queries, keys, and values
        queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)

        ##################### NEW  #####################
        # keys = keys.view(b, num_tokens, self.num_heads, self.head_dim)
        # values = values.view(b, num_tokens, self.num_heads, self.head_dim)
        keys = keys.view(b, num_tokens, self.num_kv_groups, self.head_dim)
        values = values.view(b, num_tokens, self.num_kv_groups, self.head_dim)
        ################################################

        # Transpose keys, values, and queries
        keys = keys.transpose(1, 2)  # Shape: (b, num_heads, num_tokens, head_dim)
        values = values.transpose(1, 2)  # Shape: (b, num_heads, num_tokens, head_dim)
        queries = queries.transpose(1, 2)  # Shape: (b, num_query_groups, num_tokens, head_dim)

        # Apply RoPE
        keys = compute_rope(keys, self.cos, self.sin)
        queries = compute_rope(queries, self.cos, self.sin)

        ##################### NEW  #####################
        # Expand keys and values to match the number of heads
        # Shape: (b, num_heads, num_tokens, head_dim)

        keys = keys.repeat_interleave(self.group_size, dim=1)  # Shape: (b, num_heads, num_tokens, head_dim)
        values = values.repeat_interleave(self.group_size, dim=1)  # Shape: (b, num_heads, num_tokens, head_dim)
        # For example, before repeat_interleave along dim=1 (query groups):
        #   [K1, K2]
        # After repeat_interleave (each query group is repeated group_size times):
        #   [K1, K1, K2, K2]
        # If we used regular repeat instead of repeat_interleave, we'd get:
        #   [K1, K2, K1, K2]
        ################################################

        # Compute scaled dot-product attention (aka self-attention) with a causal mask
        # Shape: (b, num_heads, num_tokens, num_tokens)
        attn_scores = queries @ keys.transpose(2, 3)  # Dot product for each head

        # Original mask truncated to the number of tokens and converted to boolean
        mask_bool = self.mask.bool()[:num_tokens, :num_tokens]

        # Use the mask to fill attention scores
        attn_scores.masked_fill_(mask_bool, -torch.inf)

        attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1)
        assert keys.shape[-1] == self.head_dim

        # Shape: (b, num_tokens, num_heads, head_dim)
        context_vec = (attn_weights @ values).transpose(1, 2)

        # Combine heads, where self.d_out = self.num_heads * self.head_dim
        context_vec = context_vec.reshape(b, num_tokens, self.d_out)
        context_vec = self.out_proj(context_vec)  # optional projection

        return context_vec

参数节省的情况,请参考以下来自 GPT 和 Llama 2 代码的多头注意力示例:

# Settings
batch_size = 1
context_len = 3000
max_context_len = 8192
embed_dim = 4096
num_heads = 32
example_batch = torch.randn((batch_size, context_len, embed_dim))
mha = MultiHeadAttention(
d_in=embed_dim,
d_out=embed_dim,
context_length=max_context_len,
num_heads=num_heads
)
mha(example_batch)
print("W_key:", mha.W_key.weight.shape)
print("W_value:", mha.W_value.weight.shape)
print("W_query:", mha.W_query.weight.shape)
W_key: torch.Size([4096, 4096])
W_value: torch.Size([4096, 4096])
W_query: torch.Size([4096, 4096])

现在,如果改用分组查询注意力,并使用 8 个 kv 组(Llama 3 8B 使用了 8 个 kv 组),可以看到 key 和 value 矩阵的行数减少了 4 倍(因为 32 个注意力头除以 8 个 kv 组就是 4):

gqa = GroupedQueryAttention(
d_in=embed_dim,
d_out=embed_dim,
context_length=max_context_len,
num_heads=num_heads,
num_kv_groups=8,
rope_base=llama_3_theta_base
)
gqa(example_batch)
print("W_key:", gqa.W_key.weight.shape)
print("W_value:", gqa.W_value.weight.shape)
print("W_query:", gqa.W_query.weight.shape)
W_key: torch.Size([1024, 4096])
W_value: torch.Size([1024, 4096])
W_query: torch.Size([4096, 4096])

顺便提一下,为了使分组查询注意力等同于标准的多头注意力,可以将查询组的数量(num_kv_groups)设置为与头的数量(num_heads)相等。

最后,比较一下下面的参数数量:

print("Total number of parameters:")
mha_total_params = sum(p.numel() for p in mha.parameters())
print(f"MHA: {mha_total_params:,}")
gqa_total_params = sum(p.numel() for p in gqa.parameters())
print(f"GQA: {gqa_total_params:,}")
Total number of parameters:
MHA: 67,108,864
GQA: 41,943,040
# Free up memory:
del mha
del gqa

1.4 更新 TransformerBlock 模块

接下来,更新 Transformer 块。在这里,只需将 MultiHeadAttention 与 GroupedQueryAttention 互换,并添加新的 RoPE 设置:

class TransformerBlock(nn.Module):
    def __init__(self, cfg):
        super().__init__()
        self.att =  GroupedQueryAttention(  # MultiHeadAttention(
            d_in=cfg["emb_dim"],
            d_out=cfg["emb_dim"],
            context_length=cfg["context_length"],
            num_heads=cfg["n_heads"],
            num_kv_groups=cfg["n_kv_groups"],  # NEW
            rope_base=cfg["rope_base"],        # NEW
            rope_config=cfg["rope_freq"],      # NEW
            dtype=cfg["dtype"]
        )
        self.ff = FeedForward(cfg)
        self.norm1 = RMSNorm(cfg["emb_dim"], eps=1e-5)
        self.norm2 = RMSNorm(cfg["emb_dim"], eps=1e-5)

    def forward(self, x):
        # Shortcut connection for attention block
        shortcut = x
        x = self.norm1(x)
        x = self.att(x.to(torch.bfloat16))   # Shape [batch_size, num_tokens, emb_size]
        x = x + shortcut  # Add the original input back

        # Shortcut connection for feed-forward block
        shortcut = x
        x = self.norm2(x)
        x = self.ff(x.to(torch.bfloat16))
        x = x + shortcut  # Add the original input back

        return x

1.5 定义模型类

幸运的是,在设置模型类时,我们不需要做太多,只需将名称更新为 Llama3Model

# class Llama2Model(nn.Module):
class Llama3Model(nn.Module):
    def __init__(self, cfg):
        super().__init__()
        self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"], dtype=cfg["dtype"])

        self.trf_blocks = nn.Sequential(
            *[TransformerBlock(cfg) for _ in range(cfg["n_layers"])])

        self.final_norm = RMSNorm(cfg["emb_dim"], eps=1e-5)
        self.out_head = nn.Linear(cfg["emb_dim"], cfg["vocab_size"], bias=False, dtype=cfg["dtype"])

    def forward(self, in_idx):
        batch_size, seq_len = in_idx.shape
        tok_embeds = self.tok_emb(in_idx)
        x = tok_embeds
        x = self.trf_blocks(x)
        x = self.final_norm(x)
        logits = self.out_head(x.to(torch.bfloat16))
        return logits

2 初始化模型

现在,我们可以定义一个 Llama 3 配置文件(为便于比较,显示的是 Llama 2 配置文件):

LLAMA2_CONFIG_7B = {
"vocab_size": 32_000,    # Vocabulary size
"context_length": 4096,  # Context length
"emb_dim": 4096,         # Embedding dimension
"n_heads": 32,           # Number of attention heads
"n_layers": 32,          # Number of layers
"hidden_dim": 11_008,    # Size of the intermediate dimension in FeedForward
"dtype": torch.bfloat16  # Lower-precision dtype to save memory
}
LLAMA3_CONFIG_8B = {
"vocab_size": 128_256,   # NEW: Larger vocabulary size
"context_length": 8192,  # NEW: Larger context length
"emb_dim": 4096,         # Embedding dimension
"n_heads": 32,           # Number of attention heads
"n_layers": 32,          # Number of layers
"hidden_dim": 14_336,    # NEW: Larger size of the intermediate dimension in FeedForward
"n_kv_groups": 8,        # NEW: Key-Value groups for grouped-query attention
"rope_base": 50_000,     # NEW: The base in RoPE's "theta" was increased to 50_000
"rope_freq": None,       # NEW: Additional configuration for adjusting the RoPE frequencies
"dtype": torch.bfloat16  # Lower-precision dtype to save memory
}

使用这些设置,我们现在可以初始化 Llama 3 8B 模型。

请注意,这需要约 34 GB 内存(作为对比,Llama 2 7B 需要约 26 GB 内存)

model = Llama3Model(LLAMA3_CONFIG_8B)
total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of parameters: {total_params:,}")
Total number of parameters: 8,030,261,248

如上图所示,模型包含 80 亿个参数。此外,我们还可以使用下面的代码计算该模型的内存需求:

def model_memory_size(model, input_dtype=torch.float32):
    total_params = 0
    total_grads = 0
    for param in model.parameters():
        # Calculate total number of elements per parameter
        param_size = param.numel()
        total_params += param_size
        # Check if gradients are stored for this parameter
        if param.requires_grad:
            total_grads += param_size

    # Calculate buffer size (non-parameters that require memory)
    total_buffers = sum(buf.numel() for buf in model.buffers())

    # Size in bytes = (Number of elements) * (Size of each element in bytes)
    # We assume parameters and gradients are stored in the same type as input dtype
    element_size = torch.tensor(0, dtype=input_dtype).element_size()
    total_memory_bytes = (total_params + total_grads + total_buffers) * element_size

    # Convert bytes to gigabytes
    total_memory_gb = total_memory_bytes / (1024**3)

    return total_memory_gb

print(f"float32 (PyTorch default): {model_memory_size(model, input_dtype=torch.float32):.2f} GB")
print(f"bfloat16: {model_memory_size(model, input_dtype=torch.bfloat16):.2f} GB")
float32 (PyTorch default): 68.08 GB
bfloat16: 34.04 GB

最后,如果适用,我们还可以将模型转移到 NVIDIA 或 Apple Silicon GPU 上:

if torch.cuda.is_available():
    device = torch.device("cuda")
elif torch.backends.mps.is_available():
    device = torch.device("mps")
else:
    device = torch.device("cpu")

model.to(device);

3 加载 tokenizer

在本节中,我们将为模型加载 tokenizer。

Llama 2 使用了谷歌的 SentencePiece tokenizer ,而不是 OpenAI 基于 Tiktoken 库的 BPE tokenizer 。然而,Llama 3 恢复使用 Tiktoken 的 BPE tokenizer;具体来说,它使用的是具有扩展词汇的 GPT-4 tokenizer。我们可以在 Meta AI 的官方 Llama 3 存储库中找到最初的 Tiktoken 适配程序。

下面重写了 tokenizer 的代码,使其更易读,更适合本笔记本使用(但表现应该是相似的):

import os
from pathlib import Path

import tiktoken
from tiktoken.load import load_tiktoken_bpe


class Tokenizer:
    def __init__(self, model_path):
        assert os.path.isfile(model_path), f"Model file {model_path} not found"
        mergeable_ranks = load_tiktoken_bpe(model_path)
        num_base_tokens = len(mergeable_ranks)

        self.special_tokens = {
            "<|begin_of_text|>": 128000,
            "<|end_of_text|>": 128001,
            "<|start_header_id|>": 128006,
            "<|end_header_id|>": 128007,
            "<|eot_id|>": 128009,
        }
        self.special_tokens.update({
            f"<|reserved_{i}|>": 128002 + i for i in range(256) if (128002 + i) not in self.special_tokens.values()
        })

        self.model = tiktoken.Encoding(
            name=Path(model_path).name,
            pat_str=r"(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+",
            mergeable_ranks=mergeable_ranks,
            special_tokens=self.special_tokens
        )


    def encode(self, text, bos=False, eos=False, allowed_special=set(), disallowed_special=()):
        if bos:
            tokens = [self.special_tokens["<|begin_of_text|>"]]
        else:
            tokens = []

        tokens += self.model.encode(text, allowed_special=allowed_special, disallowed_special=disallowed_special)

        if eos:
            tokens.append(self.special_tokens["<|end_of_text|>"])
        return tokens

    def decode(self, tokens):
        return self.model.decode(tokens)

Meta AI 在 Hugging Face Hub 上共享了 Llama 3 模型的原始权重和 tokenizer 词库。

我们将首先从 Hub 下载 tokenizer 词库,并将其加载到上述代码中。请注意,Meta AI 要求你在下载文件前接受 Llama 3 许可条款;为此必须创建一个 Hugging Face Hub 账户,并访问 meta-llama/Meta-Llama-3-8B 存储库以接受条款。

接下来,需要创建一个访问 token;要生成一个具有「读取」权限的访问 token,请点击右上角的个人资料图片,然后点击「设置」。

图片

然后,创建并复制访问 token,以便复制并粘贴到下一个代码单元中:

图片

from huggingface_hub import login
import json
with open("config.json", "r") as config_file:
config = json.load(config_file)
access_token = config["HF_ACCESS_TOKEN"]
login(token=access_token)
The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful

通过访问 token 登录(这是验证我们是否接受 Llama 3 许可条款所必需的)后,就可以下载 tokenizer 词库了:

from huggingface_hub import hf_hub_download
tokenizer_file_path = hf_hub_download(
repo_id="meta-llama/Meta-Llama-3-8B",
filename="original/tokenizer.model",
local_dir="llama3-files"
)

请注意,在使用 Llama 3 文件时,我们可能需要 blobfile 软件包,它用于处理存储在云存储解决方案(如 Google Cloud Storage (GCS)、Azure Blob Storage 或 Amazon S3)中的数据集或模型。

可以通过取消注释并执行下面的 pip 命令来安装此依赖包:

# pip install blobfile
tokenizer = Tokenizer(tokenizer_file_path)

现在,我们可以使用生成函数让 Llama 3 模型生成新文本:

from previous_chapters import generate, text_to_token_ids, token_ids_to_text
torch.manual_seed(123)
token_ids = generate(
model=model,
idx=text_to_token_ids("Every effort", tokenizer).to(device),
max_new_tokens=30,
context_size=LLAMA3_CONFIG_8B["context_length"],
top_k=1,
temperature=0.
)
print("Output text:\n", token_ids_to_text(token_ids, tokenizer))
Output text:
 Every effort_dead aeros Ingredients başında.extension clangmissions.esp 사진 Ek Pars til DoctorsDaoеньostivan normal Ekized � Ekized � Ek rdr tık%,orgen>',

当然,正如我们在上面看到的,这段文字是毫无意义的,因为我们还没有训练过 Llama 3 模型。在下一节中,我们将从 Meta AI 中加载预训练的权重,而不是自己训练模型,因为这将花费数万至数十万美元。

4 加载预训练权重

我们将加载下面的「meta-llama/Meta-Llama-3-8B 」base 模型,它是微调前的简单文本补全模型。

或者,你也可以加载经过指令微调和对齐的「meta-llama/Meta-Llama-3-8B-Instruct」模型,方法是相应修改下一个代码单元中的字符串。加起来,权重文件大约有 16 GB 大。

from safetensors.torch import load_file

combined_weights = {}

for i in range(1, 5):
    weights_file = hf_hub_download(
        repo_id="meta-llama/Meta-Llama-3-8B",
        filename=f"model-0000{i}-of-00004.safetensors",
        local_dir="llama3-files"
    )
    current_weights = load_file(weights_file)
    combined_weights.update(current_weights)
model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]
model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]
model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]
model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

权重包含以下张量(为简单起见,只显示前 15 个张量):

list(combined_weights.keys())[:15]
['model.embed_tokens.weight',
 'model.layers.0.input_layernorm.weight',
 'model.layers.0.mlp.down_proj.weight',
 'model.layers.0.mlp.gate_proj.weight',
 'model.layers.0.mlp.up_proj.weight',
 'model.layers.0.post_attention_layernorm.weight',
 'model.layers.0.self_attn.k_proj.weight',
 'model.layers.0.self_attn.o_proj.weight',
 'model.layers.0.self_attn.q_proj.weight',
 'model.layers.0.self_attn.v_proj.weight',
 'model.layers.1.input_layernorm.weight',
 'model.layers.1.mlp.down_proj.weight',
 'model.layers.1.mlp.gate_proj.weight',
 'model.layers.1.mlp.up_proj.weight',
 'model.layers.1.post_attention_layernorm.weight']

下面的函数仿照《Build a Large Language Model From Scratch》第 5 章(https://github.com/rasbt/LLMs-from-scratch/blob/0972ded5309c25dc5eecc98b62897d677c6c36c4/ch05/01_main-chapter-code/ch05.ipynb)中的 load_weights_into_gpt 函数,将预训练好的权重加载到 Llama 3 模型中:

def assign(left, right, tensor_name="unknown"):
    if left.shape != right.shape:
        raise ValueError(f"Shape mismatch in tensor '{tensor_name}'. Left: {left.shape}, Right: {right.shape}")

    if isinstance(right, torch.Tensor):
        return torch.nn.Parameter(right.clone().detach())
    else:
        return torch.nn.Parameter(torch.tensor(right))


def load_weights_into_llama(model, param_config, params):
    model.tok_emb.weight = assign(model.tok_emb.weight, params["model.embed_tokens.weight"], "model.embed_tokens.weight")

    for l in range(param_config["n_layers"]):

        # Load attention weights
        model.trf_blocks[l].att.W_query.weight = assign(
            model.trf_blocks[l].att.W_query.weight,
            params[f"model.layers.{l}.self_attn.q_proj.weight"],
            f"model.layers.{l}.self_attn.q_proj.weight"
        )
        model.trf_blocks[l].att.W_key.weight = assign(
            model.trf_blocks[l].att.W_key.weight,
            params[f"model.layers.{l}.self_attn.k_proj.weight"],
            f"model.layers.{l}.self_attn.k_proj.weight"
        )
        model.trf_blocks[l].att.W_value.weight = assign(
            model.trf_blocks[l].att.W_value.weight,
            params[f"model.layers.{l}.self_attn.v_proj.weight"],
            f"model.layers.{l}.self_attn.v_proj.weight"
        )
        model.trf_blocks[l].att.out_proj.weight = assign(
            model.trf_blocks[l].att.out_proj.weight,
            params[f"model.layers.{l}.self_attn.o_proj.weight"],
            f"model.layers.{l}.self_attn.o_proj.weight"
        )
        model.trf_blocks[l].norm1.weight = assign(
            model.trf_blocks[l].norm1.weight,
            params[f"model.layers.{l}.input_layernorm.weight"],
            f"model.layers.{l}.input_layernorm.weight"
        )

        # Load FeedForward weights
        model.trf_blocks[l].ff.fc1.weight = assign(
            model.trf_blocks[l].ff.fc1.weight,
            params[f"model.layers.{l}.mlp.gate_proj.weight"],
            f"model.layers.{l}.mlp.gate_proj.weight"
        )
        model.trf_blocks[l].ff.fc2.weight = assign(
            model.trf_blocks[l].ff.fc2.weight,
            params[f"model.layers.{l}.mlp.up_proj.weight"],
            f"model.layers.{l}.mlp.up_proj.weight"
        )
        model.trf_blocks[l].ff.fc3.weight = assign(
            model.trf_blocks[l].ff.fc3.weight,
            params[f"model.layers.{l}.mlp.down_proj.weight"],
            f"model.layers.{l}.mlp.down_proj.weight"
        )
        model.trf_blocks[l].norm2.weight = assign(
            model.trf_blocks[l].norm2.weight,
            params[f"model.layers.{l}.post_attention_layernorm.weight"],
            f"model.layers.{l}.post_attention_layernorm.weight"
        )

    # Load output layer weights
    model.final_norm.weight = assign(model.final_norm.weight, params["model.norm.weight"], "model.norm.weight")

    if "lm_head.weight" in params.keys():
        model.out_head.weight = assign(model.out_head.weight, params["lm_head.weight"], "lm_head.weight")
    else:
        model.out_head.weight = assign(model.out_head.weight, params["model.embed_tokens.weight"], "model.embed_tokens.weight")
        print("Model uses weight tying.")


load_weights_into_llama(model, LLAMA3_CONFIG_8B, combined_weights)
model.to(device);
del combined_weights  # free up memory

接下来,我们就可以使用该模型生成文本了:

torch.manual_seed(123)

token_ids = generate(
    model=model,
    idx=text_to_token_ids("Every effort", tokenizer).to(device),
    max_new_tokens=25,
    context_size=LLAMA3_CONFIG_8B["context_length"],
    top_k=1,
    temperature=0.
)

print("Output text:\n", token_ids_to_text(token_ids, tokenizer))
Output text:
 Every effort has been made to trace copyright holders and to obtain their permission for the use of copyright material. The publisher apologizes for any

5 使用指令微调模型

上面我们使用的是经过预训练的基础模型,如果你想使用一个能够遵循指令的模型,请使用「meta-llama/Llama-3-8b-Instruct」模型,如下所示:

# to free up memory
import gc
del model
gc.collect()  # Run Python garbage collector
if torch.cuda.is_available():
torch.cuda.empty_cache()
combined_weights = {}
for i in range(1, 5):
weights_file = hf_hub_download(
repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
filename=f"model-0000{i}-of-00004.safetensors",
local_dir="llama3-files"
)
current_weights = load_file(weights_file)
combined_weights.update(current_weights)
model = Llama3Model(LLAMA3_CONFIG_8B)
load_weights_into_llama(model, LLAMA3_CONFIG_8B, combined_weights)
model.to(device)
del combined_weights  # free up memory
model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s] 
model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]
model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]
model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

请注意,Llama 3 模型最好与微调时使用的正确提示模板一起使用。

下面是一个基于 Meta AI 的 Llama 3 专用 ChatFormat 代码的 tokenizer wrapper 类,用于构建提示模板:

class ChatFormat:
    def __init__(self, tokenizer):
        self.tokenizer = tokenizer

    def encode_header(self, message):
        tokens = []
        tokens.append(self.tokenizer.special_tokens["<|start_header_id|>"])
        tokens.extend(self.tokenizer.encode(message["role"], bos=False, eos=False))
        tokens.append(self.tokenizer.special_tokens["<|end_header_id|>"])
        tokens.extend(self.tokenizer.encode("\n\n", bos=False, eos=False))
        return tokens

    def encode(self, text):
        message = {
            "role": "user",
            "content": text
        }

        tokens = self.encode_header(message)
        tokens.extend(
            self.tokenizer.encode(message["content"].strip(), bos=False, eos=False)
        )
        tokens.append(self.tokenizer.special_tokens["<|eot_id|>"])
        return tokens

    def decode(self, token_ids):
        return self.tokenizer.decode(token_ids)


chat_tokenizer = ChatFormat(tokenizer)

用法如下:

token_ids = chat_tokenizer.encode("Hello World!")
print(token_ids)
[128006, 882, 128007, 271, 9906, 4435, 0, 128009]
tokenizer.decode(token_ids)
'<|start_header_id|>user<|end_header_id|>\n\nHello World!<|eot_id|>'

现在,让我们来看看 Llama 3 教学模式的实际应用:

import re
torch.manual_seed(123)
token_ids = generate(
model=model,
idx=text_to_token_ids("What do llamas eat?", chat_tokenizer).to(device),
max_new_tokens=150,
context_size=LLAMA3_CONFIG_8B["context_length"],
top_k=1,
temperature=0.
)
output_text = token_ids_to_text(token_ids, tokenizer)
def clean_text(text, header_end="assistant<|end_header_id|>\n\n"):
# Find the index of the first occurrence of "<|end_header_id|>"
index = text.find(header_end)
if index != -1:
# Return the substring starting after "<|end_header_id|>"
return text[index + len(header_end):].strip()  # Strip removes leading/trailing whitespace
else:
# If the token is not found, return the original text
return text
print("Output text:\n", clean_text(output_text))
Output text:
 Llamas are herbivores, which means they primarily eat plants and plant-based foods. Here are some of the things llamas like to eat:

1. Grass: Llamas love to graze on grass, especially in the spring and summer months.
2. Hay: Hay is a staple in a llama's diet. They like to eat timothy hay, alfalfa hay, and other types of hay.
3. Grains: Llamas may also be fed grains like oats, barley, and corn. However, grains should not make up more than 10% of a llama's diet.
4. Fruits and vegetables: Llamas may enjoy fruits and vegetables as treats, such as apples,

Llama 3.1 8B

在 Llama 3 发布几个月后,Meta AI 又推出了 Llama 3.1 模型套件(详见 Llama 3.1 官方介绍)。

方便的是,我们可以重复使用之前的 Llama 3 代码来实现 Llama 3.1 8B:

图片

结构完全相同,唯一的变化是重新调整了 RoPE 频率,如下配置文件所示:

LLAMA3_CONFIG_8B = {
"vocab_size": 128_256,   # Vocabulary size
"context_length": 8192,  # Context length
"emb_dim": 4096,         # Embedding dimension
"n_heads": 32,           # Number of attention heads
"n_layers": 32,          # Number of layers
"hidden_dim": 14_336,    # Size of the intermediate dimension in FeedForward
"n_kv_groups": 8,        # Key-Value groups for grouped-query attention
"rope_base": 50_000,     # The base in RoPE's "theta"
"rope_freq": None,       # Additional configuration for adjusting the RoPE frequencies
"dtype": torch.bfloat16  # Lower-precision dtype to save memory
}
LLAMA31_CONFIG_8B = {
"vocab_size": 128_256,    # Vocabulary size
"context_length": 8192,   # Context length
"emb_dim": 4096,          # Embedding dimension
"n_heads": 32,            # Number of attention heads
"n_layers": 32,           # Number of layers
"hidden_dim": 14_336,     # Size of the intermediate dimension in FeedForward
"n_kv_groups": 8,         # Key-Value groups for grouped-query attention
"rope_base": 50_000,      # The base in RoPE's "theta"
"dtype": torch.bfloat16,  # Lower-precision dtype to save memory
"rope_freq": {            # NEW: RoPE frequency scaling
"factor": 8.0,
"low_freq_factor": 1.0,
"high_freq_factor": 4.0,
"original_context_length": 8192,
}
}

正如我们之前在代码中看到的,RoPE 方法使用正弦函数(正弦和余弦)将位置信息直接嵌入注意力机制中。

在 Llama 3.1 中,通过附加配置,我们对反向频率计算进行了额外调整。这些调整会影响不同频率成分对位置嵌入的贡献。

让我们在实践中试试 Llama 3.1 模型;首先,我们清除旧模型,以释放一些 GPU 内存:

# free up memory
del model

gc.collect()  # Run Python garbage collector

if torch.cuda.is_available():
    torch.cuda.empty_cache()

接下来,我们下载 tokenizer。

请注意,由于 Llama 3.1 系列与 Llama 3 系列不同,、必须访问 meta-llama/Llama-3.1-8Brepository,并确认许可条款,这样 Hugging Face 访问 token 才能在下载时起作用。

简单起见,我们在下面只加载 base 模型,但也有一个经过指令微调的版本,你可以将「meta-llama/Llama-3.1-8B」替换为「meta-llama/Llama-3.1-8B-Instruct」。

tokenizer_file_path = hf_hub_download(
    repo_id="meta-llama/Llama-3.1-8B",
    filename="original/tokenizer.model",
    local_dir="llama3-files"
)

tokenizer = Tokenizer(tokenizer_file_path)
model = Llama3Model(LLAMA31_CONFIG_8B)

total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of parameters: {total_params:,}")
Total number of parameters: 8,030,261,248
combined_weights = {}

for i in range(1, 5):
    weights_file = hf_hub_download(
        repo_id="meta-llama/Llama-3.1-8B",
        filename=f"model-0000{i}-of-00004.safetensors",
        local_dir="llama3-files"
    )
    current_weights = load_file(weights_file)
    combined_weights.update(current_weights)

load_weights_into_llama(model, LLAMA31_CONFIG_8B, combined_weights)
model.to(device);
model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]
model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]
model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]
model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]
torch.manual_seed(123)

token_ids = generate(
    model=model,
    idx=text_to_token_ids("Every effort", tokenizer).to(device),
    max_new_tokens=25,
    context_size=LLAMA31_CONFIG_8B["context_length"],
    top_k=1,
    temperature=0.
)

print("Output text:\n", token_ids_to_text(token_ids, tokenizer))
Output text:
 Every effort has been made to trace copyright holders and to obtain their permission for the use of copyright material. The publisher apologizes for any

Llama 3.2 1B

截至本文撰写之时,Meta AI 的最新模型是此处公布的 Llama 3.2 模型。

Llama 3.2 文本模型的代码与 Llama 3.1 相似,只是缩小了模型的大小(有 1B 和 3B 版本)。

另一个效率上的调整是,他们又增加了权重绑定(GPT-2 架构中最初使用的概念);在这里,他们在输入(token)嵌入层和输出层中重复使用相同的权重参数值。

Llama 3.2 1B 的模型体积小,甚至可以在许多移动设备上运行,因此非常方便。

Llama 3.1 8B 和 Llama 3.2 1B 在结构上的差异如下图所示:

图片

从上图可以看出,Llama 3.1 8B 和 Llama 3.2 1B 架构的主要区别在于各自的尺寸。

一个小的额外变化是增加了 RoPE rescaling 系数,这反映在下面的配置文件中:

LLAMA31_CONFIG_8B = {
"vocab_size": 128_256,    # Vocabulary size
"context_length": 8192,   # Context length
"emb_dim": 4096,          # Embedding dimension
"n_heads": 32,            # Number of attention heads
"n_layers": 32,           # Number of layers
"hidden_dim": 14_336,     # Size of the intermediate dimension in FeedForward
"n_kv_groups": 8,         # Key-Value groups for grouped-query attention
"rope_base": 50_000,      # The base in RoPE's "theta"
"dtype": torch.bfloat16,  # Lower-precision dtype to save memory
"rope_freq": {          # RoPE frequency scaling
"factor": 8.0,
"low_freq_factor": 1.0,
"high_freq_factor": 4.0,
"original_context_length": 8192,
}
}
LLAMA32_CONFIG_1B = {
"vocab_size": 128_256,    # Vocabulary size
"context_length": 8192,   # Context length
"emb_dim": 2048,          # NEW: Half the embedding dimension
"n_heads": 32,            # Number of attention heads
"n_layers": 16,           # NEW: Half the number of layers
"hidden_dim": 8192,      # NEW: Almopst half the size of the intermediate dimension in FeedForward
"n_kv_groups": 8,         # Key-Value groups for grouped-query attention
"rope_base": 50_000,      # The base in RoPE's "theta"
"dtype": torch.bfloat16,  # Lower-precision dtype to save memory
"rope_freq": {            # RoPE frequency scaling
"factor": 32.0,       # NEW: Adjustment of the rescaling factor
"low_freq_factor": 1.0,
"high_freq_factor": 4.0,
"original_context_length": 8192,
}
}

下面,我们可以重复使用 Llama 3.1 8B 部分的代码来加载 Llama 3.2 1B 模型。

同样,由于 Llama 3.2 系列有别于 Llama 3.1 系列,因此必须访问 meta-llama/Llama-3.2-1B 软件源并确认许可条款。

简单起见,我们只在下面加载基本模型,但也有一个经过指令微调的版本,可以用 「meta-llama/Llama-3.2-1B-Instruct」替换「meta-llama/Llama-3.2-1B」。

# free up memory
del model
gc.collect()  # Run Python garbage collector
if torch.cuda.is_available():
torch.cuda.empty_cache()
tokenizer_file_path = hf_hub_download(
repo_id="meta-llama/Llama-3.2-1B",
filename="original/tokenizer.model",
local_dir="llama32-files"
)
tokenizer = Tokenizer(tokenizer_file_path)
model = Llama3Model(LLAMA32_CONFIG_1B)
total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of parameters: {total_params:,}")
# Account for weight tying
total_params_normalized = total_params - model.tok_emb.weight.numel()
print(f"\nTotal number of unique parameters: {total_params_normalized:,}")
Total number of parameters: 1,498,482,688

Total number of unique parameters: 1,235,814,400
weights_file = hf_hub_download(
repo_id="meta-llama/Llama-3.2-1B",
filename=f"model.safetensors",
local_dir="llama32-files"
)
current_weights = load_file(weights_file)
load_weights_into_llama(model, LLAMA32_CONFIG_1B, current_weights)
model.to(device);
Model uses weight tying.
print("Weight tying:", torch.equal(model.tok_emb.weight, model.out_head.weight))
Weight tying: True
torch.manual_seed(123)
token_ids = generate(
model=model,
idx=text_to_token_ids("Every effort", tokenizer).to(device),
max_new_tokens=25,
context_size=LLAMA32_CONFIG_1B["context_length"],
top_k=1,
temperature=0.
)
print("Output text:\n", token_ids_to_text(token_ids, tokenizer))
Output text:
 Every effort is made to ensure that the information on this website is accurate. However, we cannot guarantee that the information is accurate, complete

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2192480.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

自动驾驶-问题笔记-待解决

参考线的平滑方法 参考线平滑算法主要有三种&#xff1a; 离散点平滑&#xff1b;螺旋曲线平滑&#xff1b;多项式平滑&#xff1b; 参考链接&#xff1a;参考线平滑 对于平滑方法&#xff0c;一直不太理解平滑、拟合以及滤波三者的作用与区别&#xff1b; 规划的起点&#x…

代码随想录一刷完结

非常偶然的机会让我看到这个算法训练营的存在&#xff0c;虽然我也没有多大的动力&#xff0c;但当时就觉得没什么事情&#xff0c;想着刷刷题&#xff0c;为以后找工作打打基础。 收获 提示&#xff1a;在刷题过程中的收获 第一次使用CSDN记录&#xff0c;每次有别人点赞和收…

【React】事件机制

事件机制 react 基于浏览器的事件机制自身实现了一套事件机制&#xff0c;称为合成事件。比如&#xff1a;onclick -> onClick 获取原生事件&#xff1a;e.nativeEvent onClick 并不会将事件代理函数绑定到真实的 DOM节点上&#xff0c;而是将所有的事件绑定到结构的最外层…

【LeetCode: 134. 加油站 | 贪心算法】

&#x1f680; 算法题 &#x1f680; &#x1f332; 算法刷题专栏 | 面试必备算法 | 面试高频算法 &#x1f340; &#x1f332; 越难的东西,越要努力坚持&#xff0c;因为它具有很高的价值&#xff0c;算法就是这样✨ &#x1f332; 作者简介&#xff1a;硕风和炜&#xff0c;…

AI模型部署初认识

AI部署这个词儿大家肯定不陌生&#xff0c;可能有些小伙伴还不是很清楚这个是干嘛的&#xff0c;但总归是耳熟能详了。 近些年来&#xff0c;在深度学习算法已经足够卷卷卷之后&#xff0c;深度学习的另一个偏向于工程的方向–部署工业落地&#xff0c;才开始被谈论的多了起来…

C语言 | Leetcode C语言题解之第456题132模式

题目&#xff1a; 题解&#xff1a; int upper_bound(int* vec, int vecSize, int target) {int low 0, high vecSize - 1;if (vec[high] > target) {return -1;}while (low < high) {int mid (high - low) / 2 low;int num vec[mid];if (num > target) {low m…

IDEA基础开发配置以及和git的联动

1.1方向一&#xff1a;工具介绍 我今天要介绍的就是学习Java大部分情况下都会选择的一款工具-----IDEA&#xff0c;这个和我们熟悉的这个pycharm一样&#xff0c;都是属于这个Jetbrains公司的&#xff0c;虽然我对于这个并不是很了解&#xff0c;但是确实知道一点&#xff0c;…

静止坐标系和旋转坐标系变换的线性化,锁相环线性化通用推导

将笛卡尔坐标系的电压 [ U x , U y ] [U_x, U_y] [Ux​,Uy​] 通过旋转变换(由锁相环角度 θ P L L \theta_{PLL} θPLL​ 控制)转换为 dq 坐标系下的电压 [ U d , U q ] [U_d, U_q] [Ud​,Uq​]。这个公式是非线性的,因为它涉及到正弦和余弦函数。 图片中的推导过程主要…

一款基于 Java 的可视化 HTTP API 接口快速开发框架,干掉 CRUD,效率爆炸(带私活源码)

平常我们经常需要编写 API&#xff0c;但其实常常只是一些简单的增删改查&#xff0c;写这些代码非常枯燥无趣。 今天给大家带来的是一款基于 Java 的可视化 HTTP API 接口快速开发框架&#xff0c;通过 UI 界面编写接口&#xff0c;无需定义 Controller、Service、Dao 等 Jav…

使用 Python 进行大规模数据处理

在 Python 中&#xff0c;处理大量数据时&#xff0c;效率是非常重要的。当你有一个包含 100 万个元素的列表&#xff0c;每个元素都是一个字典&#xff0c;并且需要将它们转换为 DataFrame 时&#xff0c;Pandas 是一个很好的工具。Pandas 是 Python 数据处理和分析的强大库&a…

一键生成PPT的AI工具-Kimi!

一键生成PPT的AI工具-Kimi&#xff01; 前言介绍Kimi为什么选择Kimi如何使用Kimi在线编辑PPT下载生成的PPT自己编辑 结语 &#x1f600;大家好&#xff01;我是向阳&#x1f31e;&#xff0c;一个想成为优秀全栈开发工程师的有志青年&#xff01; &#x1f4d4;今天不来讨论前后…

yolov5-7.0模型DNN加载函数及参数详解(重要)

yolov5-7.0模型DNN加载函数及参数详解&#xff08;重要&#xff09; 引言yolov5&#xff08;v7.0&#xff09;1&#xff0c;yolov5.h(加载对应模型里面的相关参数要更改)2&#xff0c;main主程序&#xff08;1&#xff09;加载网络&#xff08;2&#xff09;检测推理&#xff0…

超酷!任务栏美化 给任务栏添加一个好看的时钟

如何给任务栏美化&#xff1f;今天我们这个主题就是帮大家美化任务栏&#xff0c;估计很多伙伴都用过任务栏美化工具。任务栏美化是非常有个性化的功能&#xff0c;不但可以让你的任务栏变得漂亮&#xff0c;还可以增加一些非常有创意的功能&#xff0c;比如今天小编要给大家带…

文件共享软件推荐,哪些工具最实用?

预计到2025年文档共享市场将增长至近100亿美金。文件共享软件助力跨区域协作&#xff0c;推荐ZohoWorkDrive、GoogleDrive、DropboxBusiness。软件设计直观&#xff0c;上手易&#xff0c;可保障数据安全&#xff0c;选择时需考虑企业规模、需求及预算。 一、什么是文件共享软件…

linux部署NFS和autofs自动挂载

目录 &#xff08;一&#xff09;NFS&#xff1a; 1. 什么是NFS 2. NFS守护进程 3. RPC服务 4. 原理 5. 部署 5.1 安装NFS服务 5.2 配置防火墙 5.3 创建服务端共享目录 5.4 修改服务端配置文件 (1). /etc/exports (2). nfs.conf 5.5 启动nfs并加入自启 5.6 客户端…

陀螺仪LSM6DSV16X与AI集成(14)----上报匿名上位机

陀螺仪LSM6DSV16X与AI集成.14--上报匿名上位机 概述视频教学样品申请源码下载硬件准备上位机通讯陀螺仪工作方式欧拉角数据的转换数据帧填充校验和计算数据发送演示开启INT中断中断读取传感器数据主程序演示 概述 本文介绍了如何将 LSM6DSV16X 传感器的姿态数据通过匿名通信协…

【Android】Handler消息机制

文章目录 前言概述核心组件概述Android消息机制概述 Android消息机制分析ThreadLocal的工作原理ThreadLocal基础ThreadLocal实现原理 MessageQueueLooperHandler的工作原理总结 前言 本文用于记录Android的消息机制&#xff0c;主要是指Handler的运行机制。部分内容参考自《An…

产品经理都会的ComfyUI搭建指南

最近准备参加一个ComfyUI的活动&#xff0c;发现还没有上手过ComfyUI&#xff0c;于是先部署起来。ComfyUI是一个基于Stable Diffusion开发的UI。比起WebUI表单式交互的简单&#xff0c;ComfyUI主打灵活&#xff0c;Diffusion Model管线中的各个模块如&#xff1a;VAE、Control…

DINOv2: Learning Robust Visual Featureswithout Supervision

Abstract 在自然语言处理方面的模型&#xff0c;可以产生通用视觉特征&#xff08;即无需微调即可跨图像分布和任务工作的特征&#xff09;来极大地简化任何系统中图像的使用。这些模型能够提取出一些可以在不同类型的图像和任务中通用的视觉特征。这意味着不管图像的来源&…

电脑断网或者经常断网怎么办?

1、首先&#xff0c;按一下键盘的win R &#xff0c; 在打开的运行框内输入&#xff1a;cmd 然后按一下回车 或者 点击一下【确定】 2、在命令窗口输入&#xff1a;ipconfig/release , 然后按一下回车 作用&#xff1a;IP释放&#xff0c;相当于把网线拔了重新插上 3、接着…