源码编译llama.cpp for android

news2024/12/22 14:53:41

源码编译llama.cpp for android

我这有已经编译好的版本,直接下载使用:

https://github.com/turingevo/llama.cpp-build/releases/tag/b4331

准备 android-ndk

已下载:

/media/wmx/ws1/software/qtAndroid/Sdk/ndk/23.1.7779620

版本 : llama.cpp-b4331
下载源码
切换到 llama.cpp/目录

编译脚本 llama.cpp/build-android.sh


#!/bin/bash

ANDROID_NDK_PATH=/media/wmx/ws1/software/qtAndroid/Sdk/ndk/23.1.7779620
build_dir=build-android
src_dir=.
install_dir=bin/android

cmake \
  -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_PATH}/build/cmake/android.toolchain.cmake \
  -DANDROID_ABI=arm64-v8a \
  -DANDROID_PLATFORM=android-28 \
  -DCMAKE_C_FLAGS="-march=armv8.7a" \
  -DCMAKE_CXX_FLAGS="-march=armv8.7a" \
  -DGGML_OPENMP=OFF \
  -DGGML_LLAMAFILE=OFF \
  -B ${build_dir} \
  -S ${src_dir}


cmake --build ${build_dir} --config Release -j48


cmake --install ${build_dir} --prefix ${install_dir} --config Release

push 到android设备测试

下面是 华为mate40pro 上的测试结果

build llama.cpp/bin/android


adb shell "mkdir /data/local/tmp/llama.cpp"
adb push bin/android /data/local/tmp/llama.cpp/
adb push qwen2.5-0.5b-instruct-q4_k_m.gguf /data/local/tmp/llama.cpp/


adb shell
cd /data/local/tmp/llama.cpp/android

touch test.sh
chmod a+x test.sh
cat " LD_LIBRARY_PATH=lib ./bin/llama-simple -m qwen2.5-0.5b-instruct-q4_k_m.gguf -p \"你是谁?\"  "  > test.sh

./test.sh


HWNOH:/data/local/tmp/llama.cpp/android $ ./test.sh                                                                                           
llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /sdcard/a-wmx/models/qwen2.5-0.5b-instruct-q4_k_m.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = qwen2.5-0.5b-instruct
llama_model_loader: - kv   3:                            general.version str              = v0.1
llama_model_loader: - kv   4:                           general.finetune str              = qwen2.5-0.5b-instruct
llama_model_loader: - kv   5:                         general.size_label str              = 630M
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 24
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 15
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_0:  133 tensors
llama_model_loader: - type q8_0:   13 tensors
llama_model_loader: - type q4_K:   12 tensors
llama_model_loader: - type q6_K:   12 tensors
llm_load_vocab: control token: 151659 '<|fim_prefix|>' is not marked as EOG
llm_load_vocab: control token: 151656 '<|video_pad|>' is not marked as EOG
llm_load_vocab: control token: 151655 '<|image_pad|>' is not marked as EOG
llm_load_vocab: control token: 151653 '<|vision_end|>' is not marked as EOG
llm_load_vocab: control token: 151652 '<|vision_start|>' is not marked as EOG
llm_load_vocab: control token: 151651 '<|quad_end|>' is not marked as EOG
llm_load_vocab: control token: 151649 '<|box_end|>' is not marked as EOG
llm_load_vocab: control token: 151648 '<|box_start|>' is not marked as EOG
llm_load_vocab: control token: 151646 '<|object_ref_start|>' is not marked as EOG
llm_load_vocab: control token: 151644 '<|im_start|>' is not marked as EOG
llm_load_vocab: control token: 151661 '<|fim_suffix|>' is not marked as EOG
llm_load_vocab: control token: 151647 '<|object_ref_end|>' is not marked as EOG
llm_load_vocab: control token: 151660 '<|fim_middle|>' is not marked as EOG
llm_load_vocab: control token: 151654 '<|vision_pad|>' is not marked as EOG
llm_load_vocab: control token: 151650 '<|quad_start|>' is not marked as EOG
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 151936
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 896
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_head           = 14
llm_load_print_meta: n_head_kv        = 2
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: n_embd_k_gqa     = 128
llm_load_print_meta: n_embd_v_gqa     = 128
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 4864
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 630.17 M
llm_load_print_meta: model size       = 462.96 MiB (6.16 BPW) 
llm_load_print_meta: general.name     = qwen2.5-0.5b-instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: FIM PRE token    = 151659 '<|fim_prefix|>'
llm_load_print_meta: FIM SUF token    = 151661 '<|fim_suffix|>'
llm_load_print_meta: FIM MID token    = 151660 '<|fim_middle|>'
llm_load_print_meta: FIM PAD token    = 151662 '<|fim_pad|>'
llm_load_print_meta: FIM REP token    = 151663 '<|repo_name|>'
llm_load_print_meta: FIM SEP token    = 151664 '<|file_sep|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151662 '<|fim_pad|>'
llm_load_print_meta: EOG token        = 151663 '<|repo_name|>'
llm_load_print_meta: EOG token        = 151664 '<|file_sep|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: tensor 'token_embd.weight' (q5_0) (and 290 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
llm_load_tensors:   CPU_Mapped model buffer size =   462.96 MiB
.....................................................
llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 64
llama_new_context_with_model: n_ctx_per_seq = 64
llama_new_context_with_model: n_batch       = 32
llama_new_context_with_model: n_ubatch      = 32
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 1000000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (64) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_kv_cache_init:        CPU KV buffer size =     0.75 MiB
llama_new_context_with_model: KV self size  =    0.75 MiB, K (f16):    0.38 MiB, V (f16):    0.38 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.58 MiB
llama_new_context_with_model:        CPU compute buffer size =    18.66 MiB
llama_new_context_with_model: graph nodes  = 846
llama_new_context_with_model: graph splits = 1
-p 你是谁?我是阿里云开发的超大规模语言模型,我叫通义千问。通义是“通义天下”,千问是“千问天下
main: decoded 32 tokens in 2.21 s, speed: 14.49 t/s

llama_perf_sampler_print:    sampling time =       5.69 ms /    32 runs   (    0.18 ms per token,  5622.91 tokens per second)
llama_perf_context_print:        load time =    1907.15 ms
llama_perf_context_print: prompt eval time =     165.11 ms /     5 tokens (   33.02 ms per token,    30.28 tokens per second)
llama_perf_context_print:        eval time =    2000.08 ms /    31 runs   (   64.52 ms per token,    15.50 tokens per second)
llama_perf_context_print:       total time =    3950.19 ms /    36 tokens


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2263778.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

flutter --no-color pub get 超时解决方法

新建Flutter项目后&#xff0c;运行报错&#xff0c;需要执行pub get 点击Run ‘flutter pub get’ … … … 卡着&#xff0c;不动了&#xff0c;提示超时 是因为墙的问题 解决方案&#xff1a; 添加以下环境变量 变量名: PUB_HOSTED_URL 变量值: https://pub.flutter-io.cn …

【题解】【枚举】——[NOIP2018 普及组] 龙虎斗

【题解】【枚举】——[NOIP2018 普及组] 龙虎斗 [NOIP2018 普及组] 龙虎斗题目背景题目描述输入格式输出格式输入输出样例输入 #1输出 #1输入 #2输出 #2 提示 1.思路解析2.AC代码 [NOIP2018 普及组] 龙虎斗 通往洛谷的传送门 题目背景 NOIP2018 普及组 T2 题目描述 轩轩和…

记录仪方案_记录仪安卓主板定制_音视频记录仪PCBA定制开发

记录仪主板采用了强大的联发科MTK8768处理器&#xff0c;拥有出色的性能表现。它搭载了四个主频为2.0GHz的Cortex-A53核心与四个主频为1.5GHz的Cortex-A53核心&#xff0c;确保了高效的处理速度。此外&#xff0c;主板配备了4GB的RAM(可选8GB)&#xff0c;并且内置64GB的ROM(可…

数据集-目标检测系列 车牌检测识别 数据集 CCPD2019

车牌检测&识别 数据集 CCPD2019 DataBall 助力快速掌握数据集的信息和使用方式&#xff0c;会员享有 百种数据集&#xff0c;持续增加中。 需要更多数据资源和技术解决方案&#xff0c;知识星球&#xff1a; “DataBall - X 数据球(free)” 贵在坚持&#xff01; 数据样…

Eclipse设置自动补全后 输入字符串类型变量后会自动追加String的解决方案

很简单&#xff0c;先打开eclipse&#xff0c;顶部找到window&#xff0c;点击preference 弹出一个设置窗口&#xff1b; 在窗口左侧选择Java>Editor>Content Assist&#xff1b;然后再右侧找到Disable insertion triggers except Enter 的选项&#xff08;禁用除Enter以…

uniApp上传文件踩坑日记

最近在做移动端app&#xff0c;开始接触uniapp。想着直接用PC端的前后端API去做文件上传&#xff0c;但是uniapp的底层把请求拆成了普通请求和文件上传请求&#xff0c;所以不能用一个axios去做所有请求的处理&#xff0c;拆成uni.request和uni.uploadFile去分别处理两种情况。…

Qt Quick:CheckBox 复选框

复选框不止选中和未选中2种状态哦&#xff0c;它还有1种部分选中的状态。这3种状态都是Qt自带的&#xff0c;如果想让复选框有部分选中这个状态&#xff0c;需要将三态属性&#xff08;tristate&#xff09;设为true。 未选中的状态值为0&#xff0c;部分选中是1&#xff0c;选…

Pytorch | 从零构建GoogleNet对CIFAR10进行分类

Pytorch | 从零构建GoogleNet对CIFAR10进行分类 CIFAR10数据集GoogleNet网络结构特点网络整体架构应用与影响Inceptionv1到Inceptionv2 GoogleNet结构代码详解结构代码代码详解Inception 类初始化方法前向传播 forward GoogleNet 类初始化方法前向传播 forward 训练过程和测试结…

PCIe_Host驱动分析_地址映射

往期内容 本文章相关专栏往期内容&#xff0c;PCI/PCIe子系统专栏&#xff1a; 嵌入式系统的内存访问和总线通信机制解析、PCI/PCIe引入 深入解析非桥PCI设备的访问和配置方法 PCI桥设备的访问方法、软件角度讲解PCIe设备的硬件结构 深入解析PCIe设备事务层与配置过程 PCIe的三…

jenkins 出现 Jenkins: 403 No valid crumb was included in the request

文章目录 前言解决方式:1.跨站请求为找保护勾选"代理兼容"2.全局变量或者节点上添加环境变量3.&#xff08;可选&#xff09;下载插件 the strict Crumb Issuer plugin4.重启 前言 jenkins运行时间长了&#xff0c;经常出现点了好几次才能构建&#xff0c;然后报了Je…

CentOS 7 安装、测试和部署FastDFS

目录 FastDFS环境搭建 安装 libfastcommon 库 安装FastDFS 查看编译后的文件 FastDFS配置 FastDFS启动 启动tracker服务 启动storage服务 查看storage是否已经注册到了tracker下 查看存储文件的目录 FastDFS重启 FastDFS关闭 使用fdfs_test进行测试 修改client.co…

【WRF教程第3.1期】预处理系统 WPS 详解:以4.5版本为例

预处理系统 WPS 详解&#xff1a;以4.5版本为例 每个 WPS 程序的功能程序1&#xff1a;geogrid程序2&#xff1a;ungrib程序3&#xff1a;metgrid WPS运行&#xff08;Running the WPS&#xff09;步骤1&#xff1a;Define model domains with geogrid步骤2&#xff1a;Extract…

Flutter组件————FloatingActionButton

FloatingActionButton 是Flutter中的一个组件&#xff0c;通常用于显示一个圆形的按钮&#xff0c;它悬浮在内容之上&#xff0c;旨在吸引用户的注意力&#xff0c;并代表屏幕上的主要动作。这种按钮是Material Design的一部分&#xff0c;通常放置在页面的右下角&#xff0c;但…

在Windows11上编译C#的实现Mono的步骤

在Windows11上编译Mono的步骤 1、 在win11打开开发者模式,在更新和安全选项里,如下图: 2、下载并安装64位的cygwin, 下载网站:www.cygwin.com 3、 安装 Visual Studio 2015 or later 的社区版本。 4、 下载Mono的windows最新版本。 5、 在cmd.exe里运行下面的命令来安…

[HNCTF 2022 Week1]你想学密码吗?

下载附件用记事本打开 把这些代码放在pytho中 # encode utf-8 # python3 # pycryptodemo 3.12.0import Crypto.PublicKey as pk from hashlib import md5 from functools import reducea sum([len(str(i)) for i in pk.__dict__]) funcs list(pk.__dict__.keys()) b reduc…

【记录50】uniapp安装uview插件,样式引入失败分析及解决

SassError: Undefined variable: "$u-border-color". 表示样式变量$u-border-color没定义&#xff0c;实际是定义的 首先确保安装了scss/sass 其次&#xff0c;根目录下 app.vue中是否全局引入 <style lang"scss">import /uni_modules/uview-ui/in…

如何写申请essay

俗话说&#xff1a;万事开头难。英国留学申请essay也是如此。申请essay怎么写呢&#xff1f;一篇essay的开头是否精彩直接关系到导师能否被你的文字吸引。一把而言&#xff0c;招生官每天阅读的essay在200封以上&#xff0c;每篇阅读在12分钟以内&#xff0c;所以你的essay开头…

14-zookeeper环境搭建

0、环境 java&#xff1a;1.8zookeeper&#xff1a;3.5.6 1、下载 zookeeper下载点击这里。 2、安装 下载完成后解压&#xff0c;放到你想放的目录里。先看一下zookeeper的目录结构&#xff0c;如下图&#xff1a; 进入conf目录&#xff0c;复制zoo_sample.cfg&#xff0…

【UE5】pmx导入UE5,套动作。(防止“气球人”现象。

参考视频&#xff1a;UE5Animation 16: MMD模型與動作導入 (繁中自動字幕) 问题所在&#xff1a; 做法记录&#xff08;自用&#xff09; 1.导入pmx&#xff0c;删除这两个。 2.转换给blender&#xff0c;清理节点。 3.导出时&#xff0c;内嵌贴图&#xff0c;选“复制”。 …

yolo 视频流播放并进行目标识别

根据视频流&#xff0c;实时的进行目标识别 一、下载 [lal](https://github.com/q191201771/lal/releases/tag/v0.37.4)二、安装 [FFmpeg](https://ffmpeg.org/)三、完整代码演示 需要前置了解YOLO的完整操作 使用labelImg标注&#xff0c;YOLO进行目标训练 一、下载 lal 下载…