使用 ONNX Runtime 在 iPhone 上运行 Phi-3-mini

news2025/7/13 16:56:34

更多科技分享，请关注公众号：ONE生产力

之前我们介绍了微软最新开源的小规模模型Phi-3-mini，其计算资源占用极少，非常适合嵌入式应用和移动智能终端。今天我们将探讨在iPhone上，通过ONNX Runtime运行Phi-3-mini模型。

什么是ONNX Runtime

在人工智能时代，AI模型的可移植性非常重要。ONNX Runtime可以轻松地将训练好的模型部署到不同的设备上。开发者不需要关注推理框架，使用统一的API就可以完成模型推理。在生成式AI的时代，ONNX Runtime也进行了代码优化。通过优化后的ONNX Runtime，可以在不同的终端上推理量化的生成性AI模型。在ONNX Runtime的生成性AI中，你可以通过Python、C#、C/C++的API进行AI模型推理。当然，iPhone上的部署可以利用C++的ONNX Runtime API来实现生成性AI的推理。

操作步骤

准备工作

macOS 14+
Xcode 15+
iOS SDK 17.x
安装 Python 3.10+（建议使用 Conda）
安装 Python 库 - python-flatbuffers
安装 CMake

编译ONNX Runtime for iOS

git clone https://github.com/microsoft/onnxruntime.git



cd onnxruntime



./build.sh --build_shared_lib --ios --skip_tests --parallel --build_dir ./build_ios --ios --apple_sysroot iphoneos --osx_arch arm64 --apple_deploy_target 17.4 --cmake_generator Xcode --config Release

注意：

1、在编译之前，您必须确保Xcode配置正确，并在终端上进行设置：

sudo xcode-select -switch /Applications/Xcode.app/Contents/Developer

2、ONNX Runtime需要根据不同的平台进行编译。对于iOS，您可以基于arm64/x86_64进行编译。

3、建议直接使用最新的iOS SDK进行编译。当然，您也可以降低版本以兼容过去的SDK。这样可以确保您的应用与最新的操作系统特性和安全性能保持同步，同时也能支持旧版本的设备。

编译Generative AI with ONNX Runtime for iOS

git clone https://github.com/microsoft/onnxruntime-genai



cd onnxruntime-genai



git checkout yguo/ios-build-genai





mkdir ort



cd ort



mkdir include



mkdir lib



cd ../





cp ../onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h ort/include

cp ../onnxruntime/build_ios/Release/Release-iphoneos/libonnxruntime*.dylib* ort/lib



python3 build.py --parallel --build_dir ./build_ios_simulator --ios --ios_sysroot iphoneos --osx_arch arm64 --apple_deployment_target 17.4 --cmake_generator Xcode

在Xcode中创建应用

将ONNX量化的INT4模型复制到App应用程序项目中

我们需要导入ONNX格式的INT4量化模型，首先需要下载它。

下载完成后，您需要将其添加到Xcode项目的“资源”目录中。

在ViewControllers添加C++ API

1、将相应的 C++ 头文件添加到项目中

2、在 Xcode 中添加 onnxruntime-genai.dylib

3、直接使用 C 示例上的代码在此示例中进行测试。也可以直接添加更多运行（如ChatUI）

4、因为需要调用C++，所以请将 ViewController.m 更改为 ViewController.mm

NSString *llmPath = [[NSBundle mainBundle] resourcePath];

    char const *modelPath = llmPath.cString;



    auto model =  OgaModel::Create(modelPath);



    auto tokenizer = OgaTokenizer::Create(*model);



    const char* prompt = "<|system|>You are a helpful AI assistant.<|end|><|user|>Can you introduce yourself?<|end|><|assistant|>";



    auto sequences = OgaSequences::Create();

    tokenizer->Encode(prompt, *sequences);



    auto params = OgaGeneratorParams::Create(*model);

    params->SetSearchOption("max_length", 100);

    params->SetInputSequences(*sequences);



    auto output_sequences = model->Generate(*params);

    const auto output_sequence_length = output_sequences->SequenceCount(0);

    const auto* output_sequence_data = output_sequences->SequenceData(0);

    auto out_string = tokenizer->Decode(output_sequence_data, output_sequence_length);

   

auto tmp = out_string;