大模型部署手记(7)LLaMA2+Jetson AGX Orin

news2024/9/20 18:36:41

1.简介

组织机构:Meta(Facebook)

代码仓:GitHub - facebookresearch/llama: Inference code for LLaMA models

模型:llama-2-7b、llama-2-7b-chat

下载:使用download.sh下载

硬件环境:Jetson AGX Orin

2.代码和模型下载

cd /home1/zhanghui

git clone https://github.com/facebookresearch/llama

打开LIama2模型官网:GitHub - facebookresearch/llama: Inference code for LLaMA models

点击 request a new download link

再想想办法:

填好信息(记住不能选择China),点击Accept

你会在邮箱里面收到一封邮件:

这个先放到一边,待用。

cd /home1/zhanghui

cd llama

./download.sh

在提示下输入邮箱提供的URL和模型类型(先选择7B吧),系统会开始下载模型文件:

耐心等待,文件会下载到 当前目录和 ./llama-2-7b目录下:

这个下载代码写得不错。它卡住了会重试,还能断点续传。不愧是大公司的作品。

下载完毕。

3.安装依赖

打开终端,创建llama conda环境。

conda create -n llama python=3.8

conda activate llama

cd /home1/zhanghui

安装:

pip install -e .

注意一开始它打算装torch 2.0,而这个我们后面需要将其换成jetson专用版。

cd ..

pip install ./torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl

cd llama

torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4

查看 https://forums.developer.nvidia.com/t/importerror-cannot-import-name-store-from-torch-distributed/262235

这个貌似要让用户源码编译后自制jetson安装包。感觉难度比较大。

换成torch2.1试试呢?

pip install torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl

cd llama

torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4

结果好像是一样的。

那么在Jetson AGX Orin上怎么编译torch呢?

打开 https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048

找到 Build from source:

这是一个”危险“的工作。。

cd /home1/zhanghui

mkdir newpytorch

cd newpytorch

conda activate llama

sudo nvpmodel -m 0

sudo jetson_clocks

git clone --recursive --branch v2.0.1 http://github.com/pytorch/pytorch

export USE_NCCL=0

export USE_DISTRIBUTED=1 # skip setting this if you want to enable OpenMPI backend

export USE_QNNPACK=0

export USE_PYTORCH_QNNPACK=0

export TORCH_CUDA_ARCH_LIST="7.2;8.7"

export PYTORCH_BUILD_VERSION=2.0.1

export PYTORCH_BUILD_NUMBER=1

cd pytorch

pip install -r requirements.txt

pip install scikit-build

pip install ninja

python3 setup.py bdist_wheel

耐心等待wheel制作完成。。。

编译成功,编译好的whl文件在dist目录下:

我们来安装:

cd dist

pip install ./torch-2.0.1-cp38-cp38-linux_aarch64.whl

cd /home1/zhanghui

cd llama

torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4

回头看:

唉,原来还有一个参数没改:

export USE_NCCL=0

(不知道其他两个参数要不要改。。。)

备份一下,清除build目录,重新编译。

cd /home1/zhanghui

cd newpytorch/pytorch

cd build

rm -rf *

cd ..

export USE_NCCL=1

export USE_DISTRIBUTED=1 # skip setting this if you want to enable OpenMPI backend

export USE_QNNPACK=0

export USE_PYTORCH_QNNPACK=0

export TORCH_CUDA_ARCH_LIST="7.2;8.7"

export PYTORCH_BUILD_VERSION=2.0.1

export PYTORCH_BUILD_NUMBER=1

python3 setup.py bdist_wheel

仍然耐心等待编译成功。

cd dist

pip install ./torch-2.0.1-cp38-cp38-linux_aarch64.whl --force-reinstall

4.部署验证

cd /home1/zhanghui

cd llama

torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4

运行结果如下:

(llama) zhanghui@ubuntu:/home1/zhanghui/newpytorch/pytorch/dist$ cd /home1/zhanghui
(llama) zhanghui@ubuntu:/home1/zhanghui$ cd llama
(llama) zhanghui@ubuntu:/home1/zhanghui/llama$ torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/    --tokenizer_path tokenizer.model    --max_seq_len 128 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 30.33 seconds
I believe the meaning of life is
> to love.
I believe the meaning of life is to love. We were created to love and be loved. We were created to love God and to love our neighbor as ourselves. We were created to love our spouse, our children, our family, our friends, our community, and our world.
We

==================================

Simply put, the theory of relativity states that
> 1) the laws of physics are the same for all non-accelerating observers, and 2) the speed of light is the same for all observers, regardless of their relative motion or their gravitational potential.
The first statement is the most important. It is the basis for the second.

==================================

A brief message congratulating the team on the launch:

        Hi everyone,

        I just
> wanted to let you know that the team is pleased to announce the launch of the new site.  We hope that you like the new design and that it makes it easier to find the information that you are looking for.  Please take a few minutes to let us know what you think by taking our quick survey.

==================================

Translate English to French:

        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        cheese =>
> fromage
        grilled cheese => sandwich au fromage
        giraffe => girafe
        candy cane => canne à sucre
        candy => sucre
        peppermint candy => sucre à la menthe poivrée
        pe

==================================

(llama) zhanghui@ubuntu:/home1/zhanghui/llama$

example_text_completion.py 是在做一个文本补全的任务。

# Copyright (c) Meta Platforms, Inc. and affiliates.
# This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.

import fire

from llama import Llama
from typing import List

def main(
    ckpt_dir: str,
    tokenizer_path: str,
    temperature: float = 0.6,
    top_p: float = 0.9,
    max_seq_len: int = 128,
    max_gen_len: int = 64,
    max_batch_size: int = 4,
):
    """
    Entry point of the program for generating text using a pretrained model.

    Args:
        ckpt_dir (str): The directory containing checkpoint files for the pretrained model.
        tokenizer_path (str): The path to the tokenizer model used for text encoding/decoding.
        temperature (float, optional): The temperature value for controlling randomness in generation.
            Defaults to 0.6.
        top_p (float, optional): The top-p sampling parameter for controlling diversity in generation.
            Defaults to 0.9.
        max_seq_len (int, optional): The maximum sequence length for input prompts. Defaults to 128.
        max_gen_len (int, optional): The maximum length of generated sequences. Defaults to 64.
        max_batch_size (int, optional): The maximum batch size for generating sequences. Defaults to 4.
    """
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )

    prompts: List[str] = [
        # For these prompts, the expected answer is the natural continuation of the prompt
        "I believe the meaning of life is",
        "Simply put, the theory of relativity states that ",
        """A brief message congratulating the team on the launch:

        Hi everyone,

        I just """,
        # Few shot prompt (providing a few examples before asking model to complete more);
        """Translate English to French:

        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        cheese =>""",
    ]
    results = generator.text_completion(
        prompts,
        max_gen_len=max_gen_len,
        temperature=temperature,
        top_p=top_p,
    )
    for prompt, result in zip(prompts, results):
        print(prompt)
        print(f"> {result['generation']}")
        print("\n==================================\n")


if __name__ == "__main__":
    fire.Fire(main)

其中,prompt列表如下:

# For these prompts, the expected answer is the natural continuation of the prompt

"I believe the meaning of life is",

"Simply put, the theory of relativity states that ",

"""A brief message congratulating the team on the launch:

Hi everyone,

I just """,

运行的结果如下:

I believe the meaning of life is to love. We were created to love and be loved. We were created to love God and to love our neighbor as ourselves. We were created to love our spouse, our children, our family, our friends, our community, and our world. We

Simply put, the theory of relativity states that > 1) the laws of physics are the same for all non-accelerating observers, and 2) the speed of light is the same for all observers, regardless of their relative motion or their gravitational potential. The first statement is the most important. It is the basis for the second.

Hi everyone, I just > wanted to let you know that the team is pleased to announce the launch of the new site. We hope that you like the new design and that it makes it easier to find the information that you are looking for. Please take a few minutes to let us know what you think by taking our quick survey.

第一段似乎没补齐。后面两段好像还行。

我们来改改prompt

example_text_completion_1.py

# Copyright (c) Meta Platforms, Inc. and affiliates.
# This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.

import fire

from llama import Llama
from typing import List

def main(
    ckpt_dir: str,
    tokenizer_path: str,
    temperature: float = 0.6,
    top_p: float = 0.9,
    max_seq_len: int = 128,
    max_gen_len: int = 64,
    max_batch_size: int = 4,
):
    """
    Entry point of the program for generating text using a pretrained model.

    Args:
        ckpt_dir (str): The directory containing checkpoint files for the pretrained model.
        tokenizer_path (str): The path to the tokenizer model used for text encoding/decoding.
        temperature (float, optional): The temperature value for controlling randomness in generation.
            Defaults to 0.6.
        top_p (float, optional): The top-p sampling parameter for controlling diversity in generation.
            Defaults to 0.9.
        max_seq_len (int, optional): The maximum sequence length for input prompts. Defaults to 128.
        max_gen_len (int, optional): The maximum length of generated sequences. Defaults to 64.
        max_batch_size (int, optional): The maximum batch size for generating sequences. Defaults to 4.
    """
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )

    prompts: List[str] = [
        # For these prompts, the expected answer is the natural continuation of the prompt
        "Hello, I am Zhanghui, Now I want to tell you something about me ",
    ]
    results = generator.text_completion(
        prompts,
        max_gen_len=max_gen_len,
        temperature=temperature,
        top_p=top_p,
    )
    for prompt, result in zip(prompts, results):
        print(prompt)
        print(f"> {result['generation']}")
        print("\n==================================\n")


if __name__ == "__main__":
    fire.Fire(main)

torchrun --nproc_per_node 1 example_text_completion_1.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4

运行结果如下:

(llama) zhanghui@ubuntu:/home1/zhanghui/llama$ torchrun --nproc_per_node 1 example_text_completion_1.py --ckpt_dir llama-2-7b/    --tokenizer_path tokenizer.model    --max_seq_len 128 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 20.56 seconds
Hello, I am Zhanghui, Now I want to tell you something about me
> 🙂
I am a Chinese girl. I like music, art, photography, travel, nature and etc. I love learning new things and I am always open to new ideas. I am a very positive person and I like to laugh. I am an open-minded person and I like to

==================================

(llama) zhanghui@ubuntu:/home1/zhanghui/llama$

LOL,你对我了解的太多了。居然查出了我的本质。。。

附:

Jetson Orin适配的pytorch 2.0.1的安装包已经放到网盘,请大家自行获取:

链接: 百度网盘 请输入提取码

提取码: 9snu

(全文完,谢谢阅读)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1061480.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

字符函数和字符串函数(下)

目录 strncpy(Copy characters from string)函数的使用strncat(Append characters from string)函数的使用strncmp(Compare characters of two strings)函数的使用strstr(Locate substring)的使用和模拟实现 感谢各位大佬对我的支持,如果我的文章对你有用,欢迎点击以下链接 &am…

【HUAWEI】VLAN+OSPF+单臂路由

目录 🥮写在前面 🥮3.1、拓扑图 🥮3.2、操作思路 🥮3.3、配置操作 🍣3.3.1、LSW2配置 🍣3.3.2、LSW3配置 🍣3.3.3、R1配置 🍣3.3.4、R2配置 🍣3.3.5、LSW1配置 &#x1f…

<二>Qt斗地主游戏开发:过场动画的实现

1. 过场动画效果 2. 思路分析 过场动画较为简单,只有一个进度条在进行滚动,因此实现起来不需要动画相关处理,仅需要图片和定时器设定,让进度条动起来即可。我们可以创建一个对话框,设定背景图片以及对话框透明无边框&a…

【数据结构初阶】七、非线性表里的二叉树(堆的实现 -- C语言顺序结构)

相关代码gitee自取: C语言学习日记: 加油努力 (gitee.com) 接上期: 【数据结构初阶】六、线性表中的队列(链式结构实现队列)-CSDN博客 1 . 非线性表里的 树(Tree) 树的概念及结构: 树的概念 树是一种非线性的数据…

“牛市陷阱?还是回调?是好?还是坏!“

比特币六年来首次在9月实现正回报 比特币回调:发生了什么以及接下来会发生什么? 美元的主导地位:揭示美元涟漪效应 长期持有者持有的比特币供应比例正式达到历史新高 比特币六年来首次在9月实现正回报 随着 10 月份的到来,比特币6年来首次在9月份实…

图像上传功能实现

一、后端 文件存放在images.path路径下 package com.like.common;import jakarta.servlet.ServletOutputStream; import jakarta.servlet.http.HttpServletResponse; import org.springframework.beans.factory.annotation.Value; import org.springframework.web.bind.annot…

交通物流模型 | 基于时空注意力融合网络的城市轨道交通假期短时客流预测

短时轨道交通客流预测对于交通运营管理非常重要。新兴的深度学习模型有效提高了预测精度。然而,大部分现有模型主要针对常规工作日或周末客流进行预测。由于假期客流的突发性和无规律性,仅有一小部分研究专注于假期客流预测。为此,本文提出一个全新的时空注意力融合网络(ST…

微信公众号怎么把个人改成企业?

公众号迁移有什么作用?只能变更主体吗?很多小伙伴想做公众号迁移,但是不知道公众号迁移有什么作用,今天跟大家具体讲解一下。首先公众号迁移最主要的就是修改公众号的主体了,比如我们公众号原来是A公司的,现…

Go 语言内置类型全解析:从布尔到字符串的全维度探究

目录 一、布尔类型定义基础用法声明与初始化逻辑运算 进阶用法条件语句循环结构函数返回值 常见错误与陷阱 二、整数类型定义基础用法声明与初始化运算符位运算 进阶用法数据溢出类型转换类型推断 特殊整数类型runebyte 常见问题和陷阱 三、浮点数类型定义基础用法声明与初始化…

IBT机考-PBT笔考,优劣分析,柯桥口语学习,韩语入门,topik考级韩语

IBT机考,顾名思义就是在电脑上答题考试,区别于现在的PBT纸笔答题,不需要发卷、收卷,也不需要填涂和用笔写字。 考试不需要带任何文具,就连笔试要用到的修正带都将省去。因为听力、阅读的选择题都是用鼠标点击&#xf…

深入了解 RabbitMQ:高性能消息中间件

目录 引言:一、RabbitMQ 介绍二、核心概念三、工作原理四、应用场景五、案例实战 引言: 在现代分布式系统中,消息队列成为了实现系统间异步通信、削峰填谷以及解耦组件的重要工具。而RabbitMQ作为一个高效可靠的消息队列解决方案,…

eNSP网络实验

二层VLAN 四台PC的IP地址如图所示,子网掩码均为255.255.255.0,四台PC处在同一个局域网之中,在配置VLAN之前能够彼此ping通。配置的目的是将PC1和PC3划分到VLAN10中,PC2和PC4划分到VLAN20中。 在配置之前需要进入系统视角。 创建V…

点餐小程序实战教程03-用户注册

我们上一篇介绍了如何创建用户数据源,有了数据源之后就需要思考如何判断用户是否注册过。根据用户在系统中的状态来判断是引导到注册页面还是直接显示首页。 1 前端API 判断用户是否注册,需要拿到用户登录状态的信息。我们在上一篇已经分析了微搭支持的…

线程的详解

创建状态 就绪状态 阻塞状态 运行状态 死亡状态 常用方法 setPriority(ing newPriority) 更改线程的优先级 sleep(long millis) 在指定的毫秒数内让当前正在执行的线程休眠 join() 等待该线程终止 yield() 暂停当前正在执行的线程对象,并执行其他线程 inte…

51单片机+EC11编码器实现可调参菜单+OLED屏幕显示

51单片机+EC11编码器实现可调参菜单+OLED屏幕显示 📍相关篇《stc单片机使用外部中断+EC11编码器实现计数功能》 🎈《STC单片机+EC11编码器实现调节PWM输出占空比》 🌼实际操作效果 🍁整个项目实现框架: 📓EC11接线原理图: 📓项目工程简介 📝仅凭借一个EC11编…

Python爬虫解决中文乱码

目录 一、中文乱码 二、chardet.detect()解决 三、在页面查找编码格式解决 一、中文乱码 问题在于文本的编码格式不正确 import requestsurlhttps://www.shicimingju.com/book/sanguoyanyi.html headers{User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKi…

标准差椭圆算法实现

一、标准差椭圆介绍 (一)方法介绍 标准差椭圆是一种用于描述多元数据集的离散程度和相关性的可视化工具。它可以帮助我们直观地了解数据的分布情况、方向和离散程度,以及不同变量之间的关系。 标准差椭圆的计算公式如下:…

模型训练环境相关(CUDA、PyTorch)

模型训练环境相关(CUDA、PyTorch) 1. 查看当前 GPU 所能支持的最高版本的 CUDA2. 如何判断是否安装了 CUDA3. 安装 PyTorch3.1 创建虚拟环境3.2 激活并进入虚拟环境3.3 安装 PyTorch 1. 查看当前 GPU 所能支持的最高版本的 CUDA 打开 NVIDIA 控制面板&a…

【Java】抽象类案例

需求:加入我们在开发一个系统时 需要对员工(Employee)类进行设计,员工包含3个属性:姓名、工号(number)以及工资(salary)。 经理(Manager)也是员工…

mysql面试题16:说说分库与分表的设计?常用的分库分表中间件有哪些?分库分表可能遇到的问题有哪些?

该文章专注于面试,面试只要回答关键点即可,不需要对框架有非常深入的回答,如果你想应付面试,是足够了,抓住关键点 面试官:说说分库与分表的设计? 在MySQL中,分库与分表是常用的数据库水平扩展技术,可以提高数据库的吞吐量和扩展性。下面将具体讲解MySQL中分库与分表…