快速体验LLaMA3模型微调(超算互联网平台国产异构加速卡DCU)

news2024/9/24 13:24:02

序言

本文以 LLaMA-Factory 为例,在超算互联网平台SCNet上使用异构加速卡AI 显存64GB PCIE,对 Llama3-8B-Instruct 模型进行 LoRA 微调推理合并

超算互联网平台
异构加速卡AI 显存64GB PCIE

一、参考资料

github仓库代码:LLaMA-Factory
使用最新的代码分支:v0.8.3

二、重要说明

  1. 遇到包冲突时,可使用 pip install --no-deps -e . 解决。

  2. 测试PyTorch是否支持DCU:

    (llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
    odels/LLaMA-Factory# python
    Python 3.10.8 (main, Nov  4 2022, 13:48:29) [GCC 11.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import torch
    >>> torch.cuda.is_available()
    True
    
  3. pip软件包

    环境内缺失的依赖可以到光合社区内查找,或者直接从平台预置的常用依赖包路径下查找 /public/software/apps/DeepLearning/whl/dtk-24.04,直接cp到用户项目路径下,直接pip安装。

  4. pip不安装依赖包,只安装指定包,防止包冲突。

    # 例如
    pip install --no-dependencies modelscope
    

三、准备环境

1. 系统镜像

异构加速卡AI为国产加速卡(DCU),基于DTK软件栈(对标NVIDIA的CUDA),请选择 dtk24.04 版本的镜像环境。

jupyterlab-pytorch:2.1.0-ubuntu20.04-dtk24.04-py310 镜像为例。

2. 软硬件依赖

特别注意:要求最低版本 transformers 4.41.2vllm 0.4.3

必需项至少推荐
python3.83.11
torch1.13.12.3.0
=transformers4.41.24.41.2
datasets2.16.02.19.2
accelerate0.30.10.30.1
peft0.11.10.11.1
trl0.8.60.9.4
可选项至少推荐
CUDA11.612.2
deepspeed0.10.00.14.0
bitsandbytes0.39.00.43.1
vllm0.4.30.4.3
flash-attn2.3.02.5.9

3. 克隆base环境

root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# conda create -n llama_factory_torch --clone base
Source:      /opt/conda
Destination: /opt/conda/envs/llama_factory_torch
The following packages cannot be cloned out of the root environment:
 - https://repo.anaconda.com/pkgs/main/linux-64::conda-23.7.4-py310h06a4308_0
Packages: 44
Files: 53489

Downloading and Extracting Packages


Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate llama_factory_torch
#
# To deactivate an active environment, use
#
#     $ conda deactivate

4. 安装 LLaMA Factory

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# source activate llama_factory_torch
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install -e ".[torch,metrics]"
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
...
Checking if build backend supports build_editable ... done
Building wheels for collected packages: llamafactory
  Building editable for llamafactory (pyproject.toml) ... done
  Created wheel for llamafactory: filename=llamafactory-0.8.4.dev0-0.editable-py3-none-any.whl size=20781 sha256=70c0480e2b648516e0eac3d39371d4100cbdaa1f277d87b657bf2adec9e0b2be
  Stored in directory: /tmp/pip-ephem-wheel-cache-uhypmj_8/wheels/e9/b4/89/f13e921e37904ee0c839434aad2d7b2951c2c68e596667c7ef
Successfully built llamafactory
DEPRECATION: lmdeploy 0.1.0-git782048c.abi0.dtk2404.torch2.1. has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of lmdeploy or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: mmcv 2.0.1-gitc0ccf15.abi0.dtk2404.torch2.1. has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of mmcv or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063
Installing collected packages: pydub, jieba, urllib3, tomlkit, shtab, semantic-version, scipy, ruff, rouge-chinese, joblib, importlib-resources, ffmpy, docstring-parser, aiofiles, nltk, tyro, sse-starlette, tokenizers, gradio-client, transformers, trl, peft, gradio, llamafactory
  Attempting uninstall: urllib3
    Found existing installation: urllib3 1.26.13
    Uninstalling urllib3-1.26.13:
      Successfully uninstalled urllib3-1.26.13
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.15.0
    Uninstalling tokenizers-0.15.0:
      Successfully uninstalled tokenizers-0.15.0
  Attempting uninstall: transformers
    Found existing installation: transformers 4.38.0
    Uninstalling transformers-4.38.0:
      Successfully uninstalled transformers-4.38.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lmdeploy 0.1.0-git782048c.abi0.dtk2404.torch2.1. requires transformers==4.33.2, but you have transformers 4.43.3 which is incompatible.
Successfully installed aiofiles-23.2.1 docstring-parser-0.16 ffmpy-0.4.0 gradio-4.40.0 gradio-client-1.2.0 importlib-resources-6.4.0 jieba-0.42.1 joblib-1.4.2 llamafactory-0.8.4.dev0 nltk-3.8.1 peft-0.12.0 pydub-0.25.1 rouge-chinese-1.0.3 ruff-0.5.5 scipy-1.14.0 semantic-version-2.10.0 shtab-1.7.1 sse-starlette-2.1.3 tokenizers-0.19.1 tomlkit-0.12.0 transformers-4.43.3 trl-0.9.6 tyro-0.8.5 urllib3-2.2.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip

5. 解决依赖包冲突

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-deps -e .
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: llamafactory
  Building editable for llamafactory (pyproject.toml) ... done
  Created wheel for llamafactory: filename=llamafactory-0.8.4.dev0-0.editable-py3-none-any.whl size=20781 sha256=f874a791bc9fdca02075cda0459104b48a57d300a077eca00eee7221cde429c3
  Stored in directory: /tmp/pip-ephem-wheel-cache-7vjiq3f3/wheels/e9/b4/89/f13e921e37904ee0c839434aad2d7b2951c2c68e596667c7ef
Successfully built llamafactory
Installing collected packages: llamafactory
  Attempting uninstall: llamafactory
    Found existing installation: llamafactory 0.8.4.dev0
    Uninstalling llamafactory-0.8.4.dev0:
      Successfully uninstalled llamafactory-0.8.4.dev0
Successfully installed llamafactory-0.8.4.dev0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip

6. 安装 vllm==0.4.3

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip list | grep llvm

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-dependencies vllm==0.4.3
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting vllm==0.4.3
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/1a/1e/10bcb6566f4fa8b95ff85bddfd1675ff7db33ba861f59bd70aa3b92a46b7/vllm-0.4.3-cp310-cp310-manylinux1_x86_64.whl (131.1 MB)
Installing collected packages: vllm
  Attempting uninstall: vllm
    Found existing installation: vllm 0.3.3+git3380931.abi0.dtk2404.torch2.1
    Uninstalling vllm-0.3.3+git3380931.abi0.dtk2404.torch2.1:
      Successfully uninstalled vllm-0.3.3+git3380931.abi0.dtk2404.torch2.1
Successfully installed vllm-0.4.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip

四、关键步骤

1. 获取Access Token

通过Hugging Face,获取Access Token用于登录Hugging Face 账户。

注意:选择 Write 权限。

在这里插入图片描述

在这里插入图片描述

2. 登录Hugging Face 账户

推荐使用下述命令登录您的 Hugging Face 账户。

pip install --upgrade huggingface_hub
huggingface-cli login
(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# huggingface-cli login

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible):
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
Cannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store' credential helper as default.

git config --global credential.helper store

Read https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage for more details.
Token has not been saved to git credential helper.
Your token has been saved to /root/.cache/huggingface/token
Login successful

3. llamafactory-cli 指令

使用 llamafactory-cli help 显示帮助信息。

(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/mod
els/LLaMA-Factory# llamafactory-cli help
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'
/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Fail                                     ed to load image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or d                                     irectory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this w                                     arning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `                                     libpng` installed before building `torchvision` from source?
  warn(
[2024-08-01 15:12:24,629] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to                                      cuda (auto detect)
----------------------------------------------------------------------
| Usage:                                                             |
|   llamafactory-cli api -h: launch an OpenAI-style API server       |
|   llamafactory-cli chat -h: launch a chat interface in CLI         |
|   llamafactory-cli eval -h: evaluate models                        |
|   llamafactory-cli export -h: merge LoRA adapters and export model |
|   llamafactory-cli train -h: train models                          |
|   llamafactory-cli webchat -h: launch a chat interface in Web UI   |
|   llamafactory-cli webui: launch LlamaBoard                        |
|   llamafactory-cli version: show version info                      |
----------------------------------------------------------------------

4. 快速开始

下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA 微调合并推理

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

4.1 LoRA 微调

模型微调训练是在DCU上进行的。

4.1.1 单卡环境
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-01 19:06:41,134] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a                                                                            uto detect)
08/01/2024 19:06:44 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distri                                                                            buted training: False, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-01 19:06:45,563 >> Special tokens have been added in the voca                                                                            bulary, make sure the associated word embeddings are fine-tuned or trained.
08/01/2024 19:06:45 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/01/2024 19:06:45 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/01/2024 19:06:45 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=16): 100%|███████████████████| 91/91 [00:00<00:00, 444.18 examples/s]
08/01/2024 19:06:47 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Converting format of dataset (num_proc=16): 100%|██████████████| 1000/1000 [00:00<00:00, 4851.17 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|███████████████| 1091/1091 [00:02<00:00, 375.29 examples/s]
training example:
input_ids:
[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609,                                                                             39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459                                                                            , 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-01 19:06:53,502 >> loading configuration file /root/.cache/modelsc                                                                            ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:06:53,503 >> Model config LlamaConfig {
  "_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|modeling_utils.py:3631] 2024-08-01 19:06:53,534 >> loading weights file /root/.cache/modelscope/hub/LL                                                                            M-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-01 19:06:53,534 >> Instantiating LlamaForCausalLM model under default                                                                             dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-01 19:06:53,536 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128009
}

Loading checkpoint shards: 100%|██████████████████████████████████████████████| 4/4 [00:08<00:00,  2.04s/it]
[INFO|modeling_utils.py:4463] 2024-08-01 19:07:01,775 >> All model checkpoint weights were used when initial                                                                            izing LlamaForCausalLM.

[INFO|modeling_utils.py:4471] 2024-08-01 19:07:01,775 >> All the weights of LlamaForCausalLM were initialize                                                                            d from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaFor                                                                            CausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-01 19:07:01,779 >> loading configuration file /root/.cache/modelsc                                                                            ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-01 19:07:01,780 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128009
  ],
  "max_length": 4096,
  "temperature": 0.6,
  "top_p": 0.9
}

08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementati                                                                            on.
08/01/2024 19:07:01 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/01/2024 19:07:01 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,up_proj,v_pr                                                                            oj,down_proj,k_proj,o_proj,gate_proj
08/01/2024 19:07:04 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:648] 2024-08-01 19:07:04,471 >> Using auto half precision backend
[INFO|trainer.py:2134] 2024-08-01 19:07:04,831 >> ***** Running training *****
[INFO|trainer.py:2135] 2024-08-01 19:07:04,831 >>   Num examples = 981
[INFO|trainer.py:2136] 2024-08-01 19:07:04,831 >>   Num Epochs = 3
[INFO|trainer.py:2137] 2024-08-01 19:07:04,832 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-08-01 19:07:04,832 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2141] 2024-08-01 19:07:04,832 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-08-01 19:07:04,832 >>   Total optimization steps = 366
[INFO|trainer.py:2143] 2024-08-01 19:07:04,836 >>   Number of trainable parameters = 20,971,520
{'loss': 1.5025, 'grad_norm': 1.3309401273727417, 'learning_rate': 2.702702702702703e-05, 'epoch': 0.08}
{'loss': 1.3424, 'grad_norm': 1.8096668720245361, 'learning_rate': 5.405405405405406e-05, 'epoch': 0.16}
{'loss': 1.1286, 'grad_norm': 1.2990491390228271, 'learning_rate': 8.108108108108109e-05, 'epoch': 0.24}
{'loss': 0.9808, 'grad_norm': 1.1075998544692993, 'learning_rate': 9.997948550797227e-05, 'epoch': 0.33}
{'loss': 0.9924, 'grad_norm': 1.8073676824569702, 'learning_rate': 9.961525153583327e-05, 'epoch': 0.41}
{'loss': 1.0052, 'grad_norm': 1.2079122066497803, 'learning_rate': 9.879896064123961e-05, 'epoch': 0.49}
{'loss': 0.9973, 'grad_norm': 1.7361079454421997, 'learning_rate': 9.753805025397779e-05, 'epoch': 0.57}
{'loss': 0.8488, 'grad_norm': 1.1059085130691528, 'learning_rate': 9.584400884284545e-05, 'epoch': 0.65}
{'loss': 0.9893, 'grad_norm': 0.8711654543876648, 'learning_rate': 9.373227124134888e-05, 'epoch': 0.73}
{'loss': 0.9116, 'grad_norm': 1.3793599605560303, 'learning_rate': 9.122207801708802e-05, 'epoch': 0.82}
{'loss': 1.0429, 'grad_norm': 1.3769993782043457, 'learning_rate': 8.833630016614976e-05, 'epoch': 0.9}
{'loss': 0.9323, 'grad_norm': 1.2503643035888672, 'learning_rate': 8.510123072976239e-05, 'epoch': 0.98}
{'loss': 0.9213, 'grad_norm': 2.449227809906006, 'learning_rate': 8.154634523184388e-05, 'epoch': 1.06}
{'loss': 0.8386, 'grad_norm': 1.009852409362793, 'learning_rate': 7.770403312015721e-05, 'epoch': 1.14}
 40%|███████████████████████████▌                                         | 146/366 [10:19<15:11,  4.14s/it]                                                                            {'loss': 0.856, 'grad_norm': 0.863474428653717, 'learning_rate': 7.360930265797935e-05, 'epoch': 1.22}
{'loss': 0.838, 'grad_norm': 0.712546169757843, 'learning_rate': 6.929946195508932e-05, 'epoch': 1.3}
{'loss': 0.8268, 'grad_norm': 1.6060960292816162, 'learning_rate': 6.481377904428171e-05, 'epoch': 1.39}
{'loss': 0.7326, 'grad_norm': 0.7863644957542419, 'learning_rate': 6.019312410053286e-05, 'epoch': 1.47}
{'loss': 0.7823, 'grad_norm': 0.8964634537696838, 'learning_rate': 5.547959706265068e-05, 'epoch': 1.55}
{'loss': 0.7599, 'grad_norm': 0.5305138826370239, 'learning_rate': 5.0716144050239375e-05, 'epoch': 1.63}
{'loss': 0.815, 'grad_norm': 0.8153926730155945, 'learning_rate': 4.594616607090028e-05, 'epoch': 1.71}
{'loss': 0.8258, 'grad_norm': 1.3266267776489258, 'learning_rate': 4.121312358283463e-05, 'epoch': 1.79}
{'loss': 0.7446, 'grad_norm': 1.8706341981887817, 'learning_rate': 3.656014051577713e-05, 'epoch': 1.88}
{'loss': 0.7539, 'grad_norm': 1.5148639678955078, 'learning_rate': 3.202961135812437e-05, 'epoch': 1.96}
{'loss': 0.7512, 'grad_norm': 1.3771291971206665, 'learning_rate': 2.7662814890184818e-05, 'epoch': 2.04}
{'loss': 0.7128, 'grad_norm': 1.420331597328186, 'learning_rate': 2.3499538082923606e-05, 'epoch': 2.12}
{'loss': 0.635, 'grad_norm': 0.9235875010490417, 'learning_rate': 1.9577713588953795e-05, 'epoch': 2.2}
{'loss': 0.6628, 'grad_norm': 1.6558737754821777, 'learning_rate': 1.5933074128684332e-05, 'epoch': 2.28}
{'loss': 0.681, 'grad_norm': 0.8138720393180847, 'learning_rate': 1.2598826920598772e-05, 'epoch': 2.36}
{'loss': 0.6707, 'grad_norm': 1.0700312852859497, 'learning_rate': 9.605351122011309e-06, 'epoch': 2.45}
{'loss': 0.6201, 'grad_norm': 1.3334729671478271, 'learning_rate': 6.979921036993042e-06, 'epoch': 2.53}
{'loss': 0.6698, 'grad_norm': 1.440247893333435, 'learning_rate': 4.746457613389904e-06, 'epoch': 2.61}
{'loss': 0.7072, 'grad_norm': 0.9171076416969299, 'learning_rate': 2.925310493105099e-06, 'epoch': 2.69}
{'loss': 0.6871, 'grad_norm': 0.9809044003486633, 'learning_rate': 1.5330726014397668e-06, 'epoch': 2.77}
{'loss': 0.5931, 'grad_norm': 1.7158288955688477, 'learning_rate': 5.824289648152126e-07, 'epoch': 2.85}
{'loss': 0.6827, 'grad_norm': 1.3241132497787476, 'learning_rate': 8.204113433559201e-08, 'epoch': 2.94}
100%|█████████████████████████████████████████████████████████████████████| 366/366 [25:42<00:00,  4.02s/it]                                                                            [INFO|trainer.py:3503] 2024-08-01 19:32:47,527 >> Saving model checkpoint to saves/llama3-8b/lora/sft/checkp                                                                            oint-366
[INFO|configuration_utils.py:731] 2024-08-01 19:32:47,556 >> loading configuration file /root/.cache/modelsc                                                                            ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:32:47,557 >> Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|tokenization_utils_base.py:2702] 2024-08-01 19:32:47,675 >> tokenizer config file saved in saves/llama                                                                            3-8b/lora/sft/checkpoint-366/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 19:32:47,677 >> Special tokens file saved in saves/llama3-                                                                            8b/lora/sft/checkpoint-366/special_tokens_map.json
[INFO|trainer.py:2394] 2024-08-01 19:32:48,046 >>

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 1543.2099, 'train_samples_per_second': 1.907, 'train_steps_per_second': 0.237, 'train_loss                                                                            ': 0.8416516305318947, 'epoch': 2.98}
100%|█████████████████████████████████████████████████████████████████████| 366/366 [25:43<00:00,  4.22s/it]
[INFO|trainer.py:3503] 2024-08-01 19:32:48,050 >> Saving model checkpoint to saves/llama3-8b/lora/sft
[INFO|configuration_utils.py:731] 2024-08-01 19:32:48,081 >> loading configuration file /root/.cache/modelsc                                                                            ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:32:48,082 >> Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|tokenization_utils_base.py:2702] 2024-08-01 19:32:48,191 >> tokenizer config file saved in saves/llama                                                                            3-8b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 19:32:48,192 >> Special tokens file saved in saves/llama3-                                                                            8b/lora/sft/special_tokens_map.json
***** train metrics *****
  epoch                    =     2.9847
  total_flos               = 20619353GF
  train_loss               =     0.8417
  train_runtime            = 0:25:43.20
  train_samples_per_second =      1.907
  train_steps_per_second   =      0.237
Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
08/01/2024 19:32:48 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
08/01/2024 19:32:48 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
[INFO|trainer.py:3819] 2024-08-01 19:32:48,529 >>
***** Running Evaluation *****
[INFO|trainer.py:3821] 2024-08-01 19:32:48,529 >>   Num examples = 110
[INFO|trainer.py:3824] 2024-08-01 19:32:48,529 >>   Batch size = 1
100%|█████████████████████████████████████████████████████████████████████| 110/110 [00:18<00:00,  6.07it/s]
***** eval metrics *****
  epoch                   =     2.9847
  eval_loss               =     0.9957
  eval_runtime            = 0:00:18.23
  eval_samples_per_second =      6.031
  eval_steps_per_second   =      6.031
[INFO|modelcard.py:449] 2024-08-01 19:33:06,773 >> Dropping the following result as it does not have all the                                                                             necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

输出结果

root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models# tree -L 6 LLaMA-Factory/saves/
LLaMA-Factory/saves/
`-- llama3-8b
    `-- lora
        `-- sft
            |-- README.md
            |-- adapter_config.json
            |-- adapter_model.safetensors
            |-- all_results.json
            |-- checkpoint-366
            |   |-- README.md
            |   |-- adapter_config.json
            |   |-- adapter_model.safetensors
            |   |-- optimizer.pt
            |   |-- rng_state.pth
            |   |-- scheduler.pt
            |   |-- special_tokens_map.json
            |   |-- tokenizer.json
            |   |-- tokenizer_config.json
            |   |-- trainer_state.json
            |   `-- training_args.bin
            |-- eval_results.json
            |-- special_tokens_map.json
            |-- tokenizer.json
            |-- tokenizer_config.json
            |-- train_results.json
            |-- trainer_log.jsonl
            |-- trainer_state.json
            |-- training_args.bin
            `-- training_loss.png

运行时的资源占用情况

在这里插入图片描述
在这里插入图片描述

4.1.2 多卡环境
(llama_factory_torch) root@notebook-1819291427828183041-scnlbe5oi5-51898:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0,1,2 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-02 18:08:58,775] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
08/02/2024 18:09:01 - INFO - llamafactory.cli - Initializing distributed tasks at: 127.0.0.1:26472
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING]
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] *****************************************
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] *****************************************
[2024-08-02 18:09:09,155] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-02 18:09:09,269] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-02 18:09:09,457] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:12.353489 95618 ProcessGroupNCCL.cpp:686] [Rank 2] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=353053408
08/02/2024 18:09:12 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:12 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:12.555290 95617 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=369111936
08/02/2024 18:09:12 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:12 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:13.120337 95616 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=359553664
08/02/2024 18:09:13 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:13 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
I0802 18:09:14.158418 95618 ProcessGroupNCCL.cpp:2780] Rank 2 using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
I0802 18:09:14.165846 95617 ProcessGroupNCCL.cpp:2780] Rank 1 using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-02 18:09:14,684 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
08/02/2024 18:09:14 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:14 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/02/2024 18:09:14 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=16): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:00<00:00, 301.60 examples/s]
08/02/2024 18:09:16 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Converting format of dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 3399.93 examples/s]
I0802 18:09:18.295866 95616 ProcessGroupNCCL.cpp:2780] Rank 0 using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
I0802 18:09:19.234498 95616 ProcessGroupNCCL.cpp:1340] NCCL_DEBUG: N/A
08/02/2024 18:09:19 - INFO - llamafactory.data.loader - Loading dataset identity.json...
08/02/2024 18:09:19 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Running tokenizer on dataset (num_proc=16):   0%|                                                                                                            | 0/1091 [00:00<?, ? examples/s]08/02/2024 18:09:20 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
08/02/2024 18:09:20 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Running tokenizer on dataset (num_proc=16): 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1091/1091 [00:03<00:00, 273.44 examples/s]
training example:
input_ids:
[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-02 18:09:24,080 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:09:24,082 >> Model config LlamaConfig {
  "_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|modeling_utils.py:3631] 2024-08-02 18:09:24,119 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-02 18:09:24,119 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-02 18:09:24,121 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128009
}

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:09<00:00,  2.49s/it]
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: up_proj,gate_proj,o_proj,q_proj,k_proj,v_proj,down_proj
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00,  2.59s/it]
[INFO|modeling_utils.py:4463] 2024-08-02 18:09:34,552 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4471] 2024-08-02 18:09:34,552 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-02 18:09:34,555 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-02 18:09:34,555 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128009
  ],
  "max_length": 4096,
  "temperature": 0.6,
  "top_p": 0.9
}

08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: k_proj,o_proj,v_proj,down_proj,q_proj,up_proj,gate_proj
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00,  2.52s/it]
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: gate_proj,down_proj,q_proj,o_proj,up_proj,v_proj,k_proj
08/02/2024 18:09:34 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
08/02/2024 18:09:34 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:648] 2024-08-02 18:09:34,983 >> Using auto half precision backend
08/02/2024 18:09:35 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
[INFO|trainer.py:2134] 2024-08-02 18:09:37,114 >> ***** Running training *****
[INFO|trainer.py:2135] 2024-08-02 18:09:37,114 >>   Num examples = 981
[INFO|trainer.py:2136] 2024-08-02 18:09:37,114 >>   Num Epochs = 3
[INFO|trainer.py:2137] 2024-08-02 18:09:37,114 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-08-02 18:09:37,114 >>   Total train batch size (w. parallel, distributed & accumulation) = 24
[INFO|trainer.py:2141] 2024-08-02 18:09:37,114 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-08-02 18:09:37,114 >>   Total optimization steps = 120
[INFO|trainer.py:2143] 2024-08-02 18:09:37,119 >>   Number of trainable parameters = 20,971,520
{'loss': 1.4267, 'grad_norm': 1.401288628578186, 'learning_rate': 8.333333333333334e-05, 'epoch': 0.24}
{'loss': 1.1319, 'grad_norm': 1.4780751466751099, 'learning_rate': 9.865224352899119e-05, 'epoch': 0.49}
{'loss': 0.9963, 'grad_norm': 0.532632052898407, 'learning_rate': 9.330127018922194e-05, 'epoch': 0.73}
{'loss': 0.9792, 'grad_norm': 0.7996620535850525, 'learning_rate': 8.43120818934367e-05, 'epoch': 0.98}
{'loss': 0.937, 'grad_norm': 0.4041236639022827, 'learning_rate': 7.243995901002312e-05, 'epoch': 1.22}
{'loss': 0.8805, 'grad_norm': 0.5675532221794128, 'learning_rate': 5.868240888334653e-05, 'epoch': 1.47}
{'loss': 0.8467, 'grad_norm': 0.5038197636604309, 'learning_rate': 4.4195354293738484e-05, 'epoch': 1.71}
{'loss': 0.8612, 'grad_norm': 0.7851077914237976, 'learning_rate': 3.019601169804216e-05, 'epoch': 1.96}
{'loss': 0.818, 'grad_norm': 0.450968474149704, 'learning_rate': 1.7860619515673033e-05, 'epoch': 2.2}
{'loss': 0.8308, 'grad_norm': 0.5961077809333801, 'learning_rate': 8.225609429353187e-06, 'epoch': 2.45}
{'loss': 0.8071, 'grad_norm': 0.5323781371116638, 'learning_rate': 2.100524384225555e-06, 'epoch': 2.69}
{'loss': 0.8061, 'grad_norm': 0.7563619017601013, 'learning_rate': 0.0, 'epoch': 2.94}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [09:46<00:00,  4.56s/it][INFO|trainer.py:3503] 2024-08-02 18:19:24,273 >> Saving model checkpoint to saves/llama3-8b/lora/sft/checkpoint-120
[INFO|configuration_utils.py:731] 2024-08-02 18:19:24,304 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:19:24,305 >> Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|tokenization_utils_base.py:2702] 2024-08-02 18:19:24,432 >> tokenizer config file saved in saves/llama3-8b/lora/sft/checkpoint-120/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-02 18:19:24,434 >> Special tokens file saved in saves/llama3-8b/lora/sft/checkpoint-120/special_tokens_map.json
[INFO|trainer.py:2394] 2024-08-02 18:19:24,832 >>

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 587.7138, 'train_samples_per_second': 5.008, 'train_steps_per_second': 0.204, 'train_loss': 0.9434665679931641, 'epoch': 2.94}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [09:47<00:00,  4.90s/it]
[INFO|trainer.py:3503] 2024-08-02 18:19:24,837 >> Saving model checkpoint to saves/llama3-8b/lora/sft
[INFO|configuration_utils.py:731] 2024-08-02 18:19:24,907 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:19:24,908 >> Model config LlamaConfig {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|tokenization_utils_base.py:2702] 2024-08-02 18:19:25,048 >> tokenizer config file saved in saves/llama3-8b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-02 18:19:25,055 >> Special tokens file saved in saves/llama3-8b/lora/sft/special_tokens_map.json
***** train metrics *****
  epoch                    =     2.9358
  total_flos               = 20332711GF
  train_loss               =     0.9435
  train_runtime            = 0:09:47.71
  train_samples_per_second =      5.008
  train_steps_per_second   =      0.204
Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
08/02/2024 18:19:25 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
08/02/2024 18:19:25 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
[INFO|trainer.py:3819] 2024-08-02 18:19:25,357 >>
***** Running Evaluation *****
[INFO|trainer.py:3821] 2024-08-02 18:19:25,357 >>   Num examples = 110
[INFO|trainer.py:3824] 2024-08-02 18:19:25,357 >>   Batch size = 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:08<00:00,  4.50it/s]
***** eval metrics *****
  epoch                   =     2.9358
  eval_loss               =     0.9702
  eval_runtime            = 0:00:08.33
  eval_samples_per_second =     13.193
  eval_steps_per_second   =      4.438
[INFO|modelcard.py:449] 2024-08-02 18:19:33,712 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

运行时的资源占用情况
在这里插入图片描述
在这里插入图片描述

4.2 模型合并

模型合并实在CPU上进行的。

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
[2024-08-01 21:34:37,394] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-01 21:34:42,030 >> Special tokens have been added in the vocabulary, make sure the associa                 ted word embeddings are fine-tuned or trained.
08/01/2024 21:34:42 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/01/2024 21:34:42 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-01 21:34:42,031 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Lla                 ma-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 21:34:42,032 >> Model config LlamaConfig {
  "_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

08/01/2024 21:34:42 - INFO - llamafactory.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3631] 2024-08-01 21:34:42,058 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-In                 struct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-01 21:34:42,058 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-01 21:34:42,059 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128009
}

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  3.40it/s]
[INFO|modeling_utils.py:4463] 2024-08-01 21:34:43,324 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4471] 2024-08-01 21:34:43,324 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint a                 t /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions with                 out further training.
[INFO|configuration_utils.py:991] 2024-08-01 21:34:43,327 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Lla                 ma-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-01 21:34:43,327 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128009
  ],
  "max_length": 4096,
  "temperature": 0.6,
  "top_p": 0.9
}

08/01/2024 21:34:43 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/01/2024 21:40:34 - INFO - llamafactory.model.adapter - Merged 1 adapter(s).
08/01/2024 21:40:34 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/llama3-8b/lora/sft
08/01/2024 21:40:34 - INFO - llamafactory.model.loader - all params: 8,030,261,248
08/01/2024 21:40:34 - INFO - llamafactory.train.tuner - Convert model dtype to: torch.bfloat16.
[INFO|configuration_utils.py:472] 2024-08-01 21:40:34,700 >> Configuration saved in models/llama3_lora_sft/config.json
[INFO|configuration_utils.py:807] 2024-08-01 21:40:34,704 >> Configuration saved in models/llama3_lora_sft/generation_config.json
[INFO|modeling_utils.py:2763] 2024-08-01 21:40:49,039 >> The model is bigger than the maximum size per checkpoint (2GB) and is going to be split in 9 checkpoint shards. You can find where each parameters has been saved in the index located at models/llama3_lora_sft/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2702] 2024-08-01 21:40:49,046 >> tokenizer config file saved in models/llama3_lora_sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 21:40:49,048 >> Special tokens file saved in models/llama3_lora_sft/special_tokens_map.json

输出结果

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models# tree -L 6 LLaMA-Factory/models/llama3_lora_sft/
LLaMA-Factory/models/llama3_lora_sft/
|-- config.json
|-- generation_config.json
|-- model-00001-of-00009.safetensors
|-- model-00002-of-00009.safetensors
|-- model-00003-of-00009.safetensors
|-- model-00004-of-00009.safetensors
|-- model-00005-of-00009.safetensors
|-- model-00006-of-00009.safetensors
|-- model-00007-of-00009.safetensors
|-- model-00008-of-00009.safetensors
|-- model-00009-of-00009.safetensors
|-- model.safetensors.index.json
|-- special_tokens_map.json
|-- tokenizer.json
`-- tokenizer_config.json

运行时的资源占用情况

在这里插入图片描述

4.3 LoRA 推理

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-20553:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
[2024-08-02 22:08:48,070] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-02 22:08:52,267 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-02 22:08:52,818 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
08/02/2024 22:08:52 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 22:08:52 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-02 22:08:52,820 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 22:08:52,821 >> Model config LlamaConfig {
  "_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

08/02/2024 22:08:52 - INFO - llamafactory.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3631] 2024-08-02 22:08:52,847 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-02 22:08:52,847 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-02 22:08:52,848 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128009
}

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.98s/it]
[INFO|modeling_utils.py:4463] 2024-08-02 22:09:01,148 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4471] 2024-08-02 22:09:01,148 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-02 22:09:01,151 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-02 22:09:01,152 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128009
  ],
  "max_length": 4096,
  "temperature": 0.6,
  "top_p": 0.9
}

08/02/2024 22:09:01 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 22:09:06 - INFO - llamafactory.model.adapter - Merged 1 adapter(s).
08/02/2024 22:09:06 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/llama3-8b/lora/sft
08/02/2024 22:09:06 - INFO - llamafactory.model.loader - all params: 8,030,261,248
Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.

User: 中国深圳有哪些旅游景点
Assistant: 深圳是一个非常有名的旅游城市,拥有许多名副其名的旅游景点。以下是一些主要的旅游景点:

1. Window of the World:这是一个规模宏大的主题公园,展示了世界各地的风土人情。

2. Splendid China Miniature Theme Park:这个公园展现了中国的历史和文化,拥有许多精致的模型和景观。

3. Dafen Oil Painting Village:这个村庄是中国最大的油画村,拥有数以万计的油画作品,展示了中国油画的技艺。

4. Dameisha Beach:这个沙滩是深圳最为人知的旅游景点之一,拥有洁洁的沙滩和清澈的海水,是一个非常适合休闲的场所。

5. Mangrove Forest Nature Reserve:这个自然保护区拥有广泛的 mangrove 森林,展示了中国的自然景观。

6. Shenzhen Museum:这个博物馆展现了深圳的历史和文化,拥有许多历史和艺术的收藏品。

7. Lianhua Mountain Park:这个公园是深圳最大的公园,拥有许多山路和景观,展示了中国的自然美景。

8. Shenzhen Bay Sports Center:这个体育中心拥有许多不同的运动场所,展示了中国的体育技艺。

9. OCT-LOFT:这个文化区拥有许多艺术和文化的项目,展示了中国的艺术和文化。

10. Fairy Lake Botanical Garden:这个植物园拥有许多不同的植物和花卉,展示了中国的自然美景。

User: 中国广州有哪些旅游景点
Assistant: 广州是一个非常有名的旅游城市,拥有许多名副其名的旅游景点。以下是一些主要的旅游景点:

1. Canton Tower:这是一个位于广州的超高建筑,拥有360度的观景台,展示了广州的全景。

2. Chimelong Paradise:这个主题公园拥有许多不同的游乐设施和景观,展示了中国的游乐技艺。

3. Baiyun Mountain:这个山区拥有许多不同的景观和游乐设施,展示了中国的自然美景。

4. Yuexiu Park:这个公园是广州最大的公园,拥有许多不同的景观和游乐设施,展示了中国的自然美景。

5. Temple of the Six Banyan Trees:这个寺庙拥有许多不同的文化和历史的收藏品,展示了中国的历史和文化。

6. Museum of the Chinese Revolution:这个博物馆展现了中国革命的历史和文化,拥有许多不同的收藏品和展品。

7. Guangzhou Tower:这个塔楼是广州最早的建筑,拥有许多不同的景观和游乐设施,展示了中国的历史和文化。

8. Guangzhou Museum:这个博物馆展现了广州的历史和文化,拥有许多不同的收藏品和展品。

9. Flower Street:这个街区拥有许多不同的花卉和景观,展示了中国的自然美景。

10. Shamian Island:这个岛区拥有许多不同的景观和游乐设施,展示了中国的自然美景和历史文化。

五、FAQ

Q:OSError: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.

(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/mod
els/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'
/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Fail                                     ed to load image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or d                                     irectory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this w                                     arning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `                                     libpng` installed before building `torchvision` from source?
  warn(
[2024-08-01 15:13:21,242] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to                                      cuda (auto detect)
08/01/2024 15:13:24 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cpu, n_gpu: 0, distributed training: False, compute dtype: torch.bfloat16
[INFO|tokenization_auto.py:682] 2024-08-01 15:13:25,152 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
Traceback (most recent call last):
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
    resolved_file = hf_hub_download(
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1347, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1854, in _raise_on_head_call_error
    raise head_call_error
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1751, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1673, in get_hf_file_metadata
    r = _request_wrapper(
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 376, in _request_wrapper
    response = _request_wrapper(
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 400, in _request_wrapper
    hf_raise_for_status(response)
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
    raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66ab3595-53663c2f4d5cf81405b65b9e;080cfa15-3220-4ab1-b123-4a32ba31a03a)

Cannot access gated repo for url https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/llama_fct/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
    run_exp()
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 44, in run_sft
    tokenizer_module = load_tokenizer(model_args)
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 69, in load_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 853, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 972, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/utils/hub.py", line 420, in cached_file
    raise EnvironmentError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.
401 Client Error. (Request ID: Root=1-66ab3595-53663c2f4d5cf81405b65b9e;080cfa15-3220-4ab1-b123-4a32ba31a03a)

Cannot access gated repo for url https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

错误原因:默认是从Hugging Face中获取模型,由于Hugging Face 模型授权失败,导致获取模型失败。

解决方法:从modelscope下载模型。

export USE_MODELSCOPE_HUB=1 # Windows 使用 `set USE_MODELSCOPE_HUB=1`

model_name_or_path 设置为模型 ID 来加载对应的模型。在魔搭社区查看所有可用的模型,例如 LLM-Research/Meta-Llama-3-8B-Instruct

修改 llama3_lora_sft.yaml 文件:

# model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
改为
model_name_or_path: LLM-Research/Meta-Llama-3-8B-Instruct
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

Q:OSError: LLM-Research/Meta-Llama-3-8B-Instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
[2024-08-01 21:17:22,212] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://hf-mirror.com/LLM-Research/Meta-Llama-3-8B-Instruct/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
    resolved_file = hf_hub_download(
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
    raise head_call_error
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
    hf_raise_for_status(response)
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 352, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-66ab8ae6-4ed0547e1f86fcb201b723f8;acee559e-0676-48e4-8871-b6eb58e797ca)

Repository Not Found for url: https://hf-mirror.com/LLM-Research/Meta-Llama-3-8B-Instruct/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/llama_factory_torch/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
    run_chat()
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 125, in run_chat
    chat_model = ChatModel()
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 44, in __init__
    self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 53, in __init__
    tokenizer_module = load_tokenizer(model_args)
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 69, in load_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 833, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 665, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/utils/hub.py", line 425, in cached_file
    raise EnvironmentError(
OSError: LLM-Research/Meta-Llama-3-8B-Instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

错误原因:找不到 LLM-Research/Meta-Llama-3-8B-Instruct模型。

解决方法:从modelscope下载模型。

export USE_MODELSCOPE_HUB=1

Q:ModuleNotFoundError: No module named 'modelscope'

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-01 19:05:15,320] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a                                                                            uto detect)
08/01/2024 19:05:18 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distri                                                                            buted training: False, compute dtype: torch.bfloat16
Traceback (most recent call last):
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/extras/misc.py", line 219, i                                                                            n try_download_model_from_ms
    from modelscope import snapshot_download
ModuleNotFoundError: No module named 'modelscope'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/llama_factory_torch/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
    run_exp()
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in                                                                             run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line                                                                             44, in run_sft
    tokenizer_module = load_tokenizer(model_args)
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 67, i                                                                            n load_tokenizer
    init_kwargs = _get_init_kwargs(model_args)
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 52, i                                                                            n _get_init_kwargs
    model_args.model_name_or_path = try_download_model_from_ms(model_args)
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/extras/misc.py", line 224, i                                                                            n try_download_model_from_ms
    raise ImportError("Please install modelscope via `pip install modelscope -U`")
ImportError: Please install modelscope via `pip install modelscope -U`

错误原因:缺少modelscope依赖包。

解决方法:安装modelscope

(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-dependencies modelscope
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting modelscope
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/38/37/9fe505ebc67ba5e0345a69d6e8b2ee8630523975b484                                                                            d221691ef60182bd/modelscope-1.16.1-py3-none-any.whl (5.7 MB)
Installing collected packages: modelscope
Successfully installed modelscope-1.16.1

Q:ImportError: /PATH/TO/site-packages/torch/lib/libtorch_hip.so: undefined symbol: ncclCommInitRankConfig

(llama_fct_pytorch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/mod
els/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
Traceback (most recent call last):
  File "/opt/conda/envs/llama_fct_pytorch/bin/llamafactory-cli", line 5, in <module>
    from llamafactory.cli import main
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/__init__.py", line 38, in <module>
    from .cli import VERSION
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 21, in <module>
    from . import launcher
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/launcher.py", line 15, in <module>
    from llamafactory.train.tuner import run_exp
  File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 19, in <module>
    import torch
  File "/opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/lib/libtorch_hip.so: undefined symbol: ncclCommInitRankConfig
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/lib/libtorch_hip.so: undefined symbol: ncclCommInitRankConfig

错误原因:当前PyTorch版本不支持DCU。

该问题的解决方法,请参考下文的FAQ。

Q:PyTorch版本不支持DCU

(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaM
A-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'
/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to lo                                                                            ad image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or directory'If you                                                                             don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there                                                                             might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before buildin                                                                            g `torchvision` from source?
  warn(
[2024-08-01 17:49:08,805] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a                                                                            uto detect)
08/01/2024 17:49:12 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cpu, n_gpu: 0, distribut                                                                            ed training: False, compute dtype: torch.bfloat16
Downloading: 100%|█████████████████████████████████████████████████████████| 654/654 [00:00<00:00, 2.56kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 48.0/48.0 [00:00<00:00, 183B/s]
Downloading: 100%|███████████████████████████████████████████████████████████| 187/187 [00:00<00:00, 759B/s]
Downloading: 100%|█████████████████████████████████████████████████████| 7.62k/7.62k [00:00<00:00, 29.9kB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 4.63G/4.63G [01:33<00:00, 53.4MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 4.66G/4.66G [01:02<00:00, 79.9MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 4.58G/4.58G [01:00<00:00, 81.7MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 1.09G/1.09G [00:22<00:00, 51.6MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 23.4k/23.4k [00:00<00:00, 53.6kB/s]
Downloading: 100%|██████████████████████████████████████████████████████| 36.3k/36.3k [00:00<00:00, 125kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 73.0/73.0 [00:00<00:00, 293B/s]
Downloading: 100%|█████████████████████████████████████████████████████| 8.66M/8.66M [00:00<00:00, 13.5MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 49.8k/49.8k [00:00<00:00, 90.0kB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 4.59k/4.59k [00:00<00:00, 18.7kB/s]
[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,510 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-01 17:53:53,854 >> Special tokens have been added in the voca                                                                            bulary, make sure the associated word embeddings are fine-tuned or trained.
08/01/2024 17:53:53 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/01/2024 17:53:53 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/01/2024 17:53:53 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Generating train split: 91 examples [00:00, 10580.81 examples/s]
Converting format of dataset (num_proc=16): 100%|███████████████████| 91/91 [00:00<00:00, 427.78 examples/s]
08/01/2024 17:53:56 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Generating train split: 1000 examples [00:00, 66788.28 examples/s]
Converting format of dataset (num_proc=16): 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 4688.60 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1091/1091 [00:03<00:00, 295.08 examples/s]
training example:
input_ids:
[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-01 17:54:02,106 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 17:54:02,108 >> Model config LlamaConfig {
  "_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.3",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|modeling_utils.py:3631] 2024-08-01 17:54:02,139 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-01 17:54:02,140 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-01 17:54:02,142 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128009
}

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.68it/s]
[INFO|modeling_utils.py:4463] 2024-08-01 17:54:03,708 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4471] 2024-08-01 17:54:03,709 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-01 17:54:03,712 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-01 17:54:03,713 >> Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128009
  ],
  "max_length": 4096,
  "temperature": 0.6,
  "top_p": 0.9
}

08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/01/2024 17:54:03 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/01/2024 17:54:03 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,down_proj,o_proj,k_proj,gate_proj,up_proj,v_proj
08/01/2024 17:54:08 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:648] 2024-08-01 17:54:08,091 >> Using cpu_amp half precision backend
[INFO|trainer.py:2134] 2024-08-01 17:54:09,008 >> ***** Running training *****
[INFO|trainer.py:2135] 2024-08-01 17:54:09,008 >>   Num examples = 981
[INFO|trainer.py:2136] 2024-08-01 17:54:09,008 >>   Num Epochs = 3
[INFO|trainer.py:2137] 2024-08-01 17:54:09,008 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-08-01 17:54:09,008 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2141] 2024-08-01 17:54:09,008 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-08-01 17:54:09,008 >>   Total optimization steps = 366
[INFO|trainer.py:2143] 2024-08-01 17:54:09,012 >>   Number of trainable parameters = 20,971,520
  0%|                                                                                                                                                           | 0/366 [00:00<?, ?it/s

错误原因:当前PyTorch不支持DCU,导致程序卡住,模型无法微调训练。

解决方法:在光合社区中查询并下载安装PyTorch。以 torch-2.1.0+das1.1.git3ac1bdd.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64 为例,尝试安装 torch-2.1.0

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1976198.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

C#中的Winform基础

program 每个Windows应用程序都会有一个Program类——程序入口点 [STAThread] ----指示应用程序的COM线程模型是单线程单元&#xff08;如果无此特性&#xff0c;无法工作&#xff09; static voidMain() —— 入口 System.Windows.Forms.Application类提供一系列静态方法和…

【C++】一堆数组案例 元素逆置

所谓元素逆置就是把一堆数组的元素顺序反过来 例如一堆数组的为 1&#xff0c;2&#xff0c;3&#xff0c;4 那么它的逆置为 4&#xff0c;3&#xff0c;2&#xff0c;1 逆置过程运用赋值存储的思想&#xff0c;先把第一个数组存贮到一个变量中&#xff0c;然后把末尾数组…

开源LivePortrait,快速实现表情包自定义

最近可灵AI很火&#xff0c;看到网上生成的效果也很赞啊&#xff0c;之前发现快手可灵开源了LivePortrait&#xff0c;今天去玩了一下&#xff0c;很有意思。 比如下图官方展示效果&#xff1a; 这些图片开始自带表情了&#xff0c;主要就是通过LivePortrait来实现。 LivePor…

浏览器用户文件夹详解 - Top Sites(七)

1. TopSites简介 1.1 什么是TopSites文件&#xff1f; TopSites文件是Chromium浏览器中用于存储用户访问频率最高的网站信息的一个重要文件。每当用户在浏览器中访问网站时&#xff0c;这些信息都会被记录在TopSites文件中。通过这些记录&#xff0c;浏览器可以为用户提供个性…

校园抢课助手【7】-抢课接口限流

在上一节中&#xff0c;该接口已经接受过风控的处理&#xff0c;过滤掉了机器人脚本请求&#xff0c;剩下都是人为的下单请求。为了防止用户短时间内高频率点击抢课链接&#xff0c;海量请求造成服务器过载&#xff0c;这里使用接口限流算法。 先介绍下几种常用的接口限流策略…

脚拉脚模型笔记

脚拉脚模型 ⌈♪⌋例题&#xff1a; 辅助线&#xff08;中点&#xff09;做法&#xff1a; 倍长中线Rt △ △ △ 斜边中线等腰 △ △ △ 三线合一中位线 需要&#xff1a;两个等腰三角形&#xff0c;顶角互补 共__底点__ 底角需要连接 解&#xff1a; ∵ D Q 1 / 2 A B O…

中国人工智能最好50所大学排名-2024年最强学校名单

人工智能最强的学校包含&#xff1a;清华大学、上海交通大学、南京大学、西安电子科技大学、电子科技大学、中国科学技术大学、哈尔滨工业大学、华中科技大学、东南大学、浙江大学等学校。这些都是人工智能专业排名全国前十的名牌大学。 圆梦小灯塔将在下文继续为2024年高考生…

鸿蒙应用开发 DevEcoStudio 汉化

步骤 DevEcoStudio 是默认支持中文的&#xff0c;只是默认是关闭的&#xff0c;需要在已安装的插件中搜索 Chinese 关键字&#xff0c;然后启用并重启即可&#xff08;注意&#xff1a;是在已安装的插件中搜索&#xff09;。 1. 2. 3. 重启就行

滚珠花键:新能源汽车传动系统的核心动力传递者

在日常生活中&#xff0c;汽车已经成为了必不可少的交通工具&#xff0c;尤其是新能源汽车。而滚珠花键作为传动系统中的重要组成部分&#xff0c;在传动系统方面的作用不容忽视。 随着科技的不断发展&#xff0c;汽车行业也在不断进步&#xff0c;滚珠花键作为高精度的机械传动…

PE安装win11原版系统“无法创建新的分区,也找不到现有的分区”和“windows无法对计算机进行启动到下一个安装阶段”的解决办法

问题1 针对“无法创建新的分区&#xff0c;也找不到现有的分区”&#xff1a; 解决办法&#xff1a; 用Diskgenius等分区工具删除整个分区&#xff0c;不要在分区工具里新建分区&#xff0c;而是在安装系统选择安装磁盘的时候&#xff0c;直接选择这个磁盘&#xff0c;从而完成…

五. TensorRT API的基本使用-build-model-from-scratch

目录 前言0. 简述1. 案例运行2. 代码分析2.1 main.cpp2.2 model.cpp 3. 案例3.1 sample_conv3.2 sample_permute3.3 sample_reshape3.4 sample_batchNorm3.5 sample_cbr 4. 补充说明总结下载链接参考 前言 自动驾驶之心推出的 《CUDA与TensorRT部署实战课程》&#xff0c;链接。…

《学会 SpringMVC 系列 · 写入拦截器 ResponseBodyAdvice》

&#x1f4e2; 大家好&#xff0c;我是 【战神刘玉栋】&#xff0c;有10多年的研发经验&#xff0c;致力于前后端技术栈的知识沉淀和传播。 &#x1f497; &#x1f33b; CSDN入驻不久&#xff0c;希望大家多多支持&#xff0c;后续会继续提升文章质量&#xff0c;绝不滥竽充数…

3.4数组和特殊矩阵

3.4.1数组的定义 数组是由n个相同类型的数据元素构成的有序序列 数组是线性表的推广,一个数组可以视为一个线性表 数组一旦被定义,其长度不会再改变,所以数组只会有存取元素和修改元素的操作 3.4.2数组的存储结构 多维数组 有两种映射方法:按行优先和按列优先 按行优先 …

2024 年最值得阅读的 10 个外国技术网站

从网络上数以千计的博客中挑选出最好的技术网站&#xff0c;并根据相关性、权威性、社交媒体关注者和新鲜度进行排名。 1. TechCrunch TechCrunch 是一家领先的科技媒体&#xff0c;致力于深入分析初创公司、评论新的互联网产品和发布科技新闻。该网站是科技专业人士和爱好者…

【传知代码】实体关系抽取(论文复现)

当谈论信息提取领域的最前沿时&#xff0c;实体关系抽取无疑是其中一颗耀眼的明星。从大数据时代的信息海洋中提炼出有意义的关系&#xff0c;不仅是科技进步的体现&#xff0c;更是人类对知识管理和智能决策迫切需求的响应。本文将探索实体关系抽取的核心技术、应用场景及其在…

域控搭建(windows 2012 R2和win10)

域控搭建 环境准备 两台windows虚拟机 主域控为&#xff1a;windows server2012 子域为&#xff1a;win10 虚拟机设置网段 Win10网络设置 Windows server2012网络设置 Windows server2012网络适配器 设置 识别成功 更改计算机名字 等待重启 Win10网络适配器 设置 识别成功 …

opencv-图像透视变换

透射变换是视角变化的结果&#xff0c;是指利用透视中心&#xff0c;像点&#xff0c;目标点共线的条件&#xff0c;按透视旋转定律使承影面(透视面)绕迹线(透视轴旋转某一角度&#xff0c;破坏原有的投影光束&#xff0c;仍能保持承影面上投影几何图形不变的变化) 它的本质将图…

QT实现步进电机控制和IMU数据读取显示

实现功能&#xff1a; 1.两步进电机分别使能和循环运动&#xff0c;可以设置循环次数、循环里分别运行的角度、旋转的速度和加减速度等等&#xff0c;在最下方的表格里显示发送和接收的CAN报文 2.读取水平电机当前位置和速度并画图显示&#xff0c;示波器暂停、缩放、滑动等功…

CVPR24《Neural Markov Random Field for Stereo Matching》

论文地址&#xff1a; https://arxiv.org/abs/2403.11193 源码地址&#xff1a; https://github.com/aeolusguan/NMRF 概述 手工设计的MRF模型在传统的立体匹配中占据主导地位&#xff0c;但与端到端的深度学习模型相比&#xff0c;其建模准确性不足。尽管深度学习大大改进了MR…

力扣SQL50 修复表中的名字 字符串函数

Problem: 1667. 修复表中的名字 &#x1f468;‍&#x1f3eb; 参考题解 select user_id, CONCAT(UPPER(left(name, 1)), LOWER(RIGHT(name, length(name) - 1))) as name from Users order by user_id