LLM实战:LLM微调加速神器-Unsloth + LLama3

news2024/11/24 20:35:21

1. 背景

五一结束后,本qiang~又投入了LLM的技术海洋中,本期将给大家带来LLM微调神器:Unsloth。

正如Unsloth官方的对外宣贯:Easily finetune & train LLMs; Get faster with unsloth。微调训练LLM,可以显著提升速度,其次显存占用也会显著减少。

但有一点需要说明:unsloth目前开源部分只支持单机版微调,更高效微调只能交费使用unsloth pro

2. Unsloth简介

2.1 主要特性

(1) 所有的内核均以OpenAITriton语言实现,并且手动实现反向传播引擎。Triton语言是面向LLM训练加速。

(2) 准确率0损失,没有近似方法,方法完全一致。

(3) 硬件层面无需变动。支持18年之后的Nvidia GPU(V100, T4, Titan V, RTX20,30,40x, A100, H100, L40等,GTX1070,1080也支撑,但比较慢)Cuda最低兼容版本是7.0

(4) 通过WSL适用于LinuxWindows

(5) 基于bisandbytes包,支持4bit16bit QLoRA/LoRA微调

(6) 开源代码有5倍的训练效率提升, Unsloth Pro可以提升至30

2.2 目前支撑的模型

由于底层算子需要使用triton重写,因此部分开源模型的适配工作周期可能较长。当前unsloth支持的模型包含Qwen 1.5(7B, 14B, 32B, 72B), Llama3-8B, Mistral-7B, Gemma-7B, ORPO, DPO Zephyr, Phi-3(3.8B), TinyLlama

2.3 模型加速效果

Qwen1.5-7B的集成是由Firefly作者封装并验证,性能提升30%+,显卡减少40%+,详见地址。

2.4 安装教程

conda create --name unsloth_env python=3.10

conda activate unsloth_env

conda install pytorch-cuda=<12.1/11.8> pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

pip install --no-deps trl peft accelerate bitsandbytes

3. 实战

本着眼过千遍不如手过一遍的宗旨,本qiang~针对Unsloth做了一个对比实现。对比的实验环境分别为:P40, A40, A800,对比的模型使用的是出锅热乎的Llama3(8B)

3.1 比对维度

维度

说明

显卡

是否支持bf16

最大文本长度

max_seq_length

批次大小

per_device_train_batch_size

梯度累加步长

gradient_accumulation_steps

LoRA的rank

dropout

lora_droput

3.2 源码

from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments, TextStreamer, AutoModelForCausalLM, set_seed, AutoTokenizer, BitsAndBytesConfig
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
import gc

set_seed(42)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

    ### Instruction:
    {}

    ### Input:
    {}

    ### Response:
    {}"""


def train_unsloth(dtype,
                  max_seq_length,
                  per_device_train_batch_size, 
                  gradient_accumulation_steps, 
                  rank,  
                  lora_alpha=16, 
                  lora_dropout=0, 
                  max_steps=50, 
                  save_steps=50,
                  seed=42,
                  warmup_steps=5,
                  learning_rate=2e-4,
                  logging_steps=5):
	"""
	使用unsloth进行微调训练
	"""
    print(f'dtype:{dtype}, max_seq_length:{max_seq_length}, per_device_train_batch_size:{per_device_train_batch_size}, gradient_accumulation_steps:{gradient_accumulation_steps}, rank:{rank}, lora_dropout:{lora_dropout}')
    load_in_4bit = True

    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name='pretrain_models/llama/llama3-8B-Instruct',
        max_seq_length=max_seq_length,
        dtype=dtype,
        load_in_4bit=load_in_4bit
    )

    model = FastLanguageModel.get_peft_model(
        model,
        r = rank,
        target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'],
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
        bias='none',
        use_gradient_checkpointing=True,
        random_state=seed,
        use_rslora=False
    )

    EOS_TOKEN = tokenizer.eos_token

    def formatting_prompts_func(examples):
        instructions = examples["instruction"]
        inputs       = examples["input"]
        outputs      = examples["output"]
        texts = []
        for instruction, input, output in zip(instructions, inputs, outputs):
            # Must add EOS_TOKEN, otherwise your generation will go on forever!
            text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
            texts.append(text)
        return { "text" : texts}
    pass


    dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
    dataset = dataset.map(formatting_prompts_func, batched = True)

    trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        train_dataset=dataset,
        dataset_text_field='text',
        max_seq_length=max_seq_length,
        packing=False,
        args = TrainingArguments(
            per_device_train_batch_size=per_device_train_batch_size,
            gradient_accumulation_steps=gradient_accumulation_steps,
            warmup_steps=warmup_steps,
            learning_rate=learning_rate,
            fp16 = not torch.cuda.is_bf16_supported(),
            bf16 = torch.cuda.is_bf16_supported(),
            logging_steps=logging_steps,
            optim='adamw_8bit',
            weight_decay=0.01,
            lr_scheduler_type='linear',
            seed=seed,
            output_dir='output/llame3-8b-instruct-unsloth',
            save_steps=save_steps,
            max_steps=max_steps
        )
    )

    gpu_stats = torch.cuda.get_device_properties(0)
    start_gpu_memory = round(torch.cuda.max_memory_reserved()/1024/1024/1024, 3)
    max_memory = round(gpu_stats.total_memory/1024/1024/1024, 3)
    print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
    print(f"{start_gpu_memory} GB of memory reserved.")

    trainer_stats = trainer.train()

    used_memory = round(torch.cuda.max_memory_reserved()/1024/1024/1024, 3)
    used_memory_for_lora = round(used_memory - start_gpu_memory)
    used_percentage = round(used_memory/max_memory*100, 3)
    lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
    print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
    print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
    print(f"Peak reserved memory = {used_memory} GB.")
    print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
    print(f"Peak reserved memory % of max memory = {used_percentage} %.")
    print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

    model.save_pretrained("output/llame3-8b-instruct-unsloth-lora") # Local saving
    tokenizer.save_pretrained("output/llame3-8b-instruct-unsloth-lora")

    # model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)  # Merge to 16bit
    # model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",) # Merge to 4bit
    # model.save_pretrained_merged("model", tokenizer, save_method = "lora",) # Just LoRA adapters
    # model.save_pretrained_gguf("model", tokenizer,)   # Save to 8bit Q8_0
    # model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")   # Save to 16bit GGUF
    # model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")    # Save to q4_k_m GGUF
    del model
    del tokenizer

    torch.cuda.empty_cache()
    for _ in range(3):
        gc.collect()

def train_trans(dtype, 
                max_seq_length, 
                per_device_train_batch_size, 
                gradient_accumulation_steps, 
                rank, 
                lora_alpha=16, 
                lora_dropout=0, 
                max_steps=50, 
                save_steps=50,
                seed=42,
                warmup_steps=5,
                learning_rate=2e-4,
                logging_steps=5):
	"""
	使用transformers进行微调训练
	"""
    print(f'dtype:{dtype}, max_seq_length:{max_seq_length}, per_device_train_batch_size:{per_device_train_batch_size}, gradient_accumulation_steps:{gradient_accumulation_steps}, rank:{rank}, lora_dropout:{lora_dropout}')

    model_path = 'pretrain_models/llama/llama3-8B-Instruct'
    
    tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side='right', model_max_length=8192)
    tokenizer.add_special_tokens({"pad_token" : '<|reserved_special_token_250|>'})
    tokenizer.pad_token = '<|reserved_special_token_250|>'

    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=dtype,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        torch_dtype=dtype,
        quantization_config=quantization_config
    )

    model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
    model.enable_input_require_grads()

    config = LoraConfig(
        r=rank,
        lora_alpha=lora_alpha,
        target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'],
        lora_dropout=lora_dropout,
        bias="none",
        task_type="CAUSAL_LM",
        use_rslora=False
    )

    model = get_peft_model(model, peft_config=config)
    model.gradient_checkpointing_enable()

    EOS_TOKEN = tokenizer.eos_token


    def formatting_prompts_func(examples):
        instructions = examples["instruction"]
        inputs       = examples["input"]
        outputs      = examples["output"]
        texts = []
        for instruction, input, output in zip(instructions, inputs, outputs):
            # Must add EOS_TOKEN, otherwise your generation will go on forever!
            text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
            texts.append(text)
        return { "text" : texts}
    pass


    dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
    dataset = dataset.map(formatting_prompts_func, batched = True,)

    trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        train_dataset=dataset,
        dataset_text_field='text',
        max_seq_length=max_seq_length,
        packing=False,
        args = TrainingArguments(
            per_device_train_batch_size=per_device_train_batch_size,
            gradient_accumulation_steps=gradient_accumulation_steps,
            warmup_steps=warmup_steps,
            learning_rate=learning_rate,
            fp16 = not torch.cuda.is_bf16_supported(),
            bf16 = torch.cuda.is_bf16_supported(),
            logging_steps=logging_steps,
            optim='adamw_8bit',
            weight_decay=0.01,
            lr_scheduler_type='linear',
            seed=seed,
            output_dir='output/llame3-8b-instruct-unsloth',
            save_steps=save_steps,
            max_steps=max_steps
        )
    )

    gpu_stats = torch.cuda.get_device_properties(0)
    start_gpu_memory = round(torch.cuda.max_memory_reserved()/1024/1024/1024, 3)
    max_memory = round(gpu_stats.total_memory/1024/1024/1024, 3)
    print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
    print(f"{start_gpu_memory} GB of memory reserved.")

    trainer_stats = trainer.train()

    used_memory = round(torch.cuda.max_memory_reserved()/1024/1024/1024, 3)
    used_memory_for_lora = round(used_memory - start_gpu_memory)
    used_percentage = round(used_memory/max_memory*100, 3)
    lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
    print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
    print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
    print(f"Peak reserved memory = {used_memory} GB.")
    print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
    print(f"Peak reserved memory % of max memory = {used_percentage} %.")
    print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

    model.save_pretrained("output/llame3-8b-instruct-unsloth-lora") # Local saving
    tokenizer.save_pretrained("output/llame3-8b-instruct-unsloth-lora")
    
    del model
    del tokenizer

    torch.cuda.empty_cache()
    for _ in range(3):
        gc.collect()

def infer():

    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name='output/llame3-8b-instruct-unsloth-lora',
        max_seq_length=2048,
        dtype=torch.float16,
        load_in_4bit=True
    )

    # 2x的速率进行推理
    FastLanguageModel.for_inference(model)

    inputs = tokenizer([alpaca_prompt.format('Continue the fibonnaci sequence.', '1, 1, 2, 3, 5, 8', '')], return_tensors = "pt").to('cuda')
    outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True)
    print(tokenizer.batch_decode(outputs))

    text_streamer = TextStreamer(tokenizer)
    outputs = model.generate(**inputs, max_new_tokens=1024, streamer=text_streamer)
    print(tokenizer.batch_decode(outputs))


if __name__ == '__main__':

    train_unsloth(dtype=torch.bfloat16, max_seq_length=1024, per_device_train_batch_size=1, gradient_accumulation_steps=16, rank=8, lora_dropout=0)
    train_unsloth(dtype=torch.bfloat16, max_seq_length=1024, per_device_train_batch_size=1, gradient_accumulation_steps=16, rank=64, lora_dropout=0)
    train_unsloth(dtype=torch.bfloat16, max_seq_length=2048, per_device_train_batch_size=1, gradient_accumulation_steps=16, rank=64, lora_dropout=0)
    train_unsloth(dtype=torch.bfloat16, max_seq_length=2048, per_device_train_batch_size=4, gradient_accumulation_steps=4, rank=64, lora_dropout=0)
    train_unsloth(dtype=torch.bfloat16, max_seq_length=2048, per_device_train_batch_size=4, gradient_accumulation_steps=4, rank=64, lora_dropout=0.05)
    train_unsloth(dtype=torch.bfloat16, max_seq_length=2048, per_device_train_batch_size=16, gradient_accumulation_steps=4, rank=64, lora_dropout=0.05)
    
    train_trans(dtype=torch.bfloat16, max_seq_length=1024, per_device_train_batch_size=1, gradient_accumulation_steps=16, rank=8, lora_dropout=0)
    train_trans(dtype=torch.bfloat16, max_seq_length=1024, per_device_train_batch_size=1, gradient_accumulation_steps=16, rank=64, lora_dropout=0)
    train_trans(dtype=torch.bfloat16, max_seq_length=2048, per_device_train_batch_size=1, gradient_accumulation_steps=16, rank=64, lora_dropout=0)
    train_trans(dtype=torch.bfloat16, max_seq_length=2048, per_device_train_batch_size=4, gradient_accumulation_steps=4, rank=64, lora_dropout=0)
    train_trans(dtype=torch.bfloat16, max_seq_length=2048, per_device_train_batch_size=4, gradient_accumulation_steps=4, rank=64, lora_dropout=0.05)

4 实验结果

4.1 P40

4.2 A40

4.3 A800

4.4 结论

针对于llama3-8B进行unsloth训练,与基于transformers框架训练进行比对,结论如下:

(1) 集成unsloth后,显卡占用确实更少,训练效率确实更快,不管是哪种维度。

(2) P40增加batch_size后,显卡的内存占用提升,但训练的时间也更长,说明P40针对大批次的数据处理,性能会降低; A40, A800增加batch_size后,显卡内存占用虽然提升,但训练的时间更短。

(3) A800batch_size1时,训练效率不如A40,当batch_size增加到16时,A800的训练效率比A40快接近一倍。因此,A800更适合处理大批次的场景,对于小batch_size,杀鸡不能用牛刀。

5. 总结

一句话足矣~

本文主要是使用unsloth框架针对llama3的高效微调实验,提供了详细的对比代码以及对比分析结果。

之后会写一篇关于Qwen1.5的对比实验,敬请期待~

6. 参考

1. unsloth: https://github.com/unslothai/unsloth

2. Qwen1.5+Unsloth: Support Qwen2 by yangjianxin1 · Pull Request #428 · unslothai/unsloth · GitHub


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1672057.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

(项目)-KDE巡检报告(模板

金山云于12月26日对建行共计【30】个KDE集群,合计【198】台服务器进行了巡检服务。共发现系统风险【135】条,服务风险【1912】条,服务配置风险【368】条。 一、系统风险 1、风险分析(图片+描述) (1)磁盘使用率高 问题描述多个集群的多台服务器磁盘使用率较高,远超过…

UniAD大模型开路,智能车驶入AGI时代

作者 |老缅 编辑 |德新 在刚刚结束不久的北京车展上&#xff0c;除一众明星车型亮相&#xff0c;供应链企业也开始大秀肌肉&#xff0c;其中尤其以端到端大模型为代表&#xff0c;焕新一代的智驾技术栈掀起了新一轮热潮。 作为首个提出感知决策一体化自动驾驶通用模型的公司&…

计算机网络(第八版 谢希仁 编著) 期末复习大纲

一.每章总结 第一章&#xff1a;分组交换&#xff0c;计网定义、范围划分&#xff0c;性能指标&#xff0c;五层体系结构&#xff0c;TCP/IP体系结构 第二章&#xff1a;物理层&#xff0c;码元&#xff0c;基带调制(数字信号->数字信号&#xff0c;也叫编码)&#xff0c;带…

mysql group by 细节介绍

mysql中group by的用法是配合聚合函数&#xff0c;利用分组信息进行统计&#xff0c;语句如“select name,sum(id) from test group by name,number”。 先来看下表1&#xff0c;表名为test&#xff1a; 执行如下SQL语句&#xff1a; SELECT name FROM test GROUP BY name 你…

专业音频修复软件:iZotope RX 11 for Mac 激活版

iZotope RX 专为满足后期制作专业人士的苛刻需求而设计的一款专业音频修复软件。iZotope RX 10添加了新的特性和功能&#xff0c;以解决当今后期项目中存在的一些最常见的修复问题&#xff0c;使其成为音频后期制作的最终选择。虽然包含许多其他新功能&#xff0c;但这里是新的…

Postman基础功能-断言与日志

若能脱颖而出&#xff0c;何必苦苦融入。大家好&#xff0c;在 API 测试的领域中&#xff0c;Postman 是一款极为强大且广泛使用的工具。其中&#xff0c;断言和日志调试功能扮演着至关重要的角色。 一、介绍 断言允许我们在测试过程中验证 API 的响应是否符合预期。通过设定各…

数字人解决方案——AniTalker声音驱动肖像生成生动多样的头部说话视频算法解析

1.概述 AniTalker是一款先进的AI驱动的动画生成工具&#xff0c;它超越了简单的嘴唇同步技术&#xff0c;能够精准捕捉并再现人物的面部表情、头部动作以及其他非言语的微妙动态。这不仅意味着AniTalker能够生成嘴型精准同步的视频&#xff0c;更重要的是&#xff0c;它还能够…

使用LangChain和Neo4j快速创建RAG应用

大家好&#xff0c;Neo4j 通过集成原生的向量搜索功能&#xff0c;增强了其对检索增强生成&#xff08;RAG&#xff09;应用的支持&#xff0c;这标志着一个重要的里程碑。这项新功能通过向量索引搜索处理非结构化文本&#xff0c;增强了 Neo4j 在存储和分析结构化数据方面的现…

1-3ARM_GD32点亮LED灯

简介&#xff1a; 最多可支持 112 个通用 I/O 引脚(GPIO)&#xff0c;分别为 PA0 ~ PA15&#xff0c;PB0 ~ PB15&#xff0c;PC0 ~ PC15&#xff0c;PD0 ~ PD15&#xff0c;PE0 ~ PE15&#xff0c;PF0 ~ PF15 和 PG0 ~ PG15&#xff0c;各片上设备用其来实现逻辑输入/输出功能。…

基于SpringBoot+微信小程序的订餐(点餐)配送系统设计与实现+毕业论文(12000字)

系统介绍 本微信小程序在线订餐系统管理员功能可以修改个人中心&#xff0c;用户管理&#xff0c;菜品分类管理&#xff0c;菜品信息管理&#xff0c;订单信息管理&#xff0c;取消订单管理&#xff0c;订单配送管理&#xff0c;菜品评价管理以及系统管理。微信小程序用户可以…

【玄机平台】应急响应

前言&#xff1a; 感谢玄机平台靶机的提供&#xff0c;让我学到了不少东西 平台题解 &#xff1a; 第一章 应急响应-webshell查杀 1.黑客webshell里面的flag flag{xxxxx-xxxx-xxxx-xxxx-xxxx} ssh连接 下载/var/www/html源码&#xff08;finsehll连直接下&#xff09;压缩丢…

日志的基本用法

目标 1. 掌握如何设置日志级别 2. 掌握如何设置日志格式 3. 掌握如何将日志信息输出到文件中 1. logging模块 Python中有一个标准库模块logging可以直接记录日志 1.1 基本用法 import logging logging.debug("这是一条调试信息") logging.info("这是一条…

【C++杂货铺】红黑树

目录 &#x1f308;前言&#x1f308; &#x1f4c1; 红黑树的概念 &#x1f4c1; 红黑树的性质 &#x1f4c1; 红黑树节点的定义 &#x1f4c1; 红黑树的插入操作 &#x1f4c1; 红黑树和AVL树的比较 &#x1f4c1; 全代码展示 &#x1f4c1; 总结 &#x1f308;前言…

C#【进阶】常用泛型数据结构类

常用泛型数据结构类 文章目录 常用泛型数据结构类1、List1、List的本质2、声明3、增删查改4、遍历思考 存储基类类型列表 2、Dictionary1、Dictionary的本质2、声明3、增删查改4、遍历思考1 数字对应的大写思考 2 字母出现的次数 3、顺序存储和链式存储1、数据结构2、线性表3、…

前端铺子-NodeJS后端:基于Node.js构建高效后端服务的探索与实践

一、引言 随着前端技术的快速发展&#xff0c;越来越多的开发者开始关注前后端分离的开发模式。前端铺子作为一个旨在服务前端开发者的开源项目&#xff0c;近期推出了基于Node.js的后端系统。该系统通过整合Node.js、Nodemon和MySQL等技术&#xff0c;为前端开发者提供了一个…

每日一题 城市群的数量

题目解析 城市群数量_牛客题霸_牛客网 当解决这个问题时&#xff0c;首先需要理解题目要求。题目中给出了一个城市之间的邻接矩阵&#xff0c;矩阵中的元素表示城市之间是否直接相连。如果两个城市直接相连&#xff0c;或者通过其他城市间接相连&#xff0c;它们就属于同一个城…

Java面试八股之String s = “String“;和String s = new String(“String“);有什么区别

Java中String s "String";和String s new String("String");有什么区别 字符串字面量&#xff08;"String"&#xff09;&#xff1a; 常量池&#xff1a;使用字面量方式创建字符串时&#xff0c;Java虚拟机&#xff08;JVM&#xff09;会在运…

数组 | 双指针经典题目

Leetcode977&#xff1a;有序数组的平方 . - 力扣&#xff08;LeetCode&#xff09;. - 备战技术面试&#xff1f;力扣提供海量技术面试资源&#xff0c;帮助你高效提升编程技能,轻松拿下世界 IT 名企 Dream Offer。https://leetcode.cn/problems/squares-of-a-sorted-array/d…

AI2024(64bit) Adobe Illustrator 软件安装包下载

AI2024(64bit) Adobe Illustrator 软件安装包下载地址&#xff1a; 百度网盘下载https://pan.baidu.com/s/1C10-2JVN1rxFF5VFRuV2Yw?pwdSIMS 在创意设计的浩瀚宇宙中&#xff0c;Adobe Illustrator 2024如同一颗璀璨新星&#xff0c;以其无与伦比的创新功能和优化体验&#x…

AI翻唱+视频剪辑全流程实战

目录 一、AI翻唱之模型训练 &#xff08;1&#xff09;模型部署 &#xff08;2&#xff09;数据集制作——搜集素材 &#xff08;3&#xff09;数据集制作——提升音频质量 方法一&#xff1a;使用RVC提供的音频处理功能。 方法二&#xff1a;可以使用音频剪辑工具Ad…