PandaGPT部署演示

news2026/3/23 13:51:25

PandaGPT 是一种通用的指令跟踪模型，可以看到和听到。实验表明，PandaGPT 可以执行复杂的任务，例如生成详细的图像描述、编写受视频启发的故事以及回答有关音频的问题。更有趣的是，PandaGPT 可以同时接受多模态输入并自然地组合它们的语义。例如，PandaGPT 可以连接对象在照片中的外观以及它们在音频中的声音。为此，PandaGPT 结合了 ImageBind 的多模式编码器和 Vicuna 的大型语言模型。值得注意的是，尽管 PandaGPT 在六种模式（文本、图像/视频、音频、深度、热和 IMU）中展示了令人印象深刻的跨模式功能，但它仅使用对齐的图像-文本对进行训练，感谢 ImageBind 提供的共享嵌入空间。我们希望 PandaGPT 成为构建 AGI 的第一步，它可以像我们人类一样整体感知和理解不同模式的输入。

github：GitHub - yxuansu/PandaGPT: PandaGPT: One Model To Instruction-Follow Them All

1、搭建环境

conda create -n pandagpt python=3.9

conda activate pandagpt

创建虚拟环境pandagpt，python3.9，激活环境

git clone https://github.com/yxuansu/PandaGPT.git

cd /content/drive/MyDrive/PandaGPT

pip install -r requirements.txt

pip install torch==1.13.1+cu117 -f https://download.pytorch.org/whl/torch/

pip install gradio mdtex2html

下载PandaGPT源码，安装依赖包。

2、下载模型

curl -L https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth -o /content/drive/MyDrive/imagebind_ckpt/_huge.pth

准备ImageBind Checkpoint，保存为/content/drive/MyDrive/imagebind_ckpt/_huge.pth

准备Vicuna Checkpoint，保存为/content/drive/MyDrive/vicuna_ckpt/7b_v0/vicuna-7b-v0，详见

Llama-7b-hf和vicuna-7b-delta-v0合并成vicuna-7b-v0_Spielberg_1的博客-CSDN博客

pip install huggingface_hub

from huggingface_hub import snapshot_download

save_dir="/content/drive/MyDrive/pandagpt_ckpt/7b"
repo_id="openllmplayground/pandagpt_7b_max_len_1024"

snapshot_download(repo_id=repo_id,local_dir=save_dir, local_dir_use_symlinks=False)

下载PandaGPT权重，保存到/content/drive/MyDrive/pandagpt_ckpt/7b

还可以从huggingface网站下载模型等。

三、部署演示

cd /content/drive/MyDrive/PandaGPT/code

进入PandaGPT目录，打开/PandaGPT/code/web_demo.py文件，

传入的imagebind_ckpt_path、vicuna_ckpt_path模型路径修改成你的目录

delta_ckpt_path修改为pandagpt权重路径，最后是.pt文件

最后一行，share参数改为True，这样就可以分享链接，在网页上打开了

终于可以运行啦

cd /content/drive/MyDrive/PandaGPT/code


CUDA_VISIBLE_DEVICES=0 python web_demo.py

我们看一下效果，打开链接，看到页面有四个框，依次上传图片、音频、视频、热力图，

文件上传到模型受网络影响，图片可能要等待几秒，音频和视频要更长一点，有文件上传完成提示就更好了(期待大神更新)。

在[input]框中输入想要的提问，点击[submit]，等到推理出结果，在[chatbot]中对给出回答

传入一张猫咪，提问：“这是什么颜色” 回答：灰色和白色

上传一段雷声，提问：这是什么声音

上传汉堡图片，提问：举两个生产商

总结：总的来看，pandagpt使用文本和图像对进行训练，能够实现跨模态的输入的，生成文本，还是很让人欣喜的，感谢分享，期待大神继续完善。

遇到的问题：

ImportError: LlamaTokenizer requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/google/sentencepiece# installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

pip install Sentencepiece

/mnt/PandaGPT/code/web_demo.py:154: GradioUnusedKwargWarning: You have unused kwarg parameters in Blocks, please remove them: {'scale': 4}
with gr.Blocks(scale=4) as demo:
/mnt/PandaGPT/code/web_demo.py:157: GradioUnusedKwargWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 4}
with gr.Row(scale=4):
/mnt/PandaGPT/code/web_demo.py:163: GradioUnusedKwargWarning: You have unused kwarg parameters in Video, please remove them: {'type': 'file'}
video_path = gr.Video(type='file', label="Video")
/mnt/PandaGPT/code/web_demo.py:167: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
chatbot = gr.Chatbot().style(height=300)
/mnt/PandaGPT/code/web_demo.py:171: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
user_input = gr.Textbox(show_label=False, placeholder="Input...", lines=10).style(container=False)
/mnt/PandaGPT/code/web_demo.py:173: GradioUnusedKwargWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 1}
with gr.Row(scale=1):
/mnt/PandaGPT/code/web_demo.py:175: GradioUnusedKwargWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 1}
with gr.Row(scale=1):
Running on local URL: http://0.0.0.0:24000

To create a public link, set `share=True` in `launch()`.
Error: no DISPLAY environment variable specified

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 23.70 GiB total capacity; 7.16 GiB already allocated; 12.69 MiB free; 7.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

查看显卡显存被占用，杀掉占用进程

Ubuntu下跑Aplaca报错：torch.cuda.0utofMemoryError: CUDA out of memory.解决办法（查看CUDA占用情况&清除GPU缓存）_李卓璐的博客-CSDN博客