安装GraphRAG

本文没有安装成功，一直卡在构建图节点。

我用的思路是GraphRAG+Ollama（大语言模型）+Xinference（词嵌入）。找到的其他思路是，修改源码。

1 简介

1.1 GraphRAG

GraphRAG是微软开源的一种基于图的检索增强生成 (RAG) 方法。

# 参考地址
https://microsoft.github.io/graphrag/posts/get_started/

# Github地址
https://github.com/microsoft/graphrag

1.2 本地大模型管理工具

比较出名的本地化大模型管理管理工具包括Ollama、XInference、LM Studio、LocalAI。

(1) Ollama

Ollama是一个专注于本地部署大型语言模型的工具，旨在简化部署和管理大型语言模型的过程。它通过提供便捷的模型管理、丰富的预建模型库、跨平台支持以及灵活的自定义选项，使得开发者和研究人员能够在本地环境中高效利用大型语言模型进行各种自然语言处理任务。

# 官网地址
https://ollama.com/

(2) XInference

Xorbits Inference (Xinference) 是一个国产的开源平台，用于简化各种 AI 模型的运行和集成。借助 Xinference，您可以使用任何开源 LLM、嵌入模型和多模态模型在云端或本地环境中运行推理，并创建强大的 AI 应用。

# 官网地址
https://inference.readthedocs.io/en/latest/

# Docker安装地址
https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html

# 快速入门
https://inference.readthedocs.io/zh-cn/latest/models/model_abilities/index.html

# Github地址
https://github.com/xorbitsai/inference

（3）LocalAI

LocalAI是免费的开源OpenAI替代品。LocalAI作为一个替代REST API的插件，与OpenAI（Elevenlabs，Anthropic…）API规范兼容，用于本地人工智能推理。它允许您使用消费级硬件在本地或本地运行LLM、生成图像、音频（而不仅仅是），支持多个型号系列。不需要GPU。

# 官网地址
https://localai.io/

# Github
https://github.com/go-skynet/LocalAI

（4）LM Studio

LM Studio 是一款桌面应用程序，用于本地运行大型语言模型（LLMs），提供用户友好的图形用户界面（GUI），适合初学者。它支持多个操作系统，专注于易用性，非常适合希望在自有设备上运行如GPT和LLaMA等模型的非技术用户。

# 官网地址
https://lmstudio.ai/

1.3 其他

基于本地大模型管理工具已经其他开源项目可以快速实现系统.

基于chatbox实现模型的对话

# Github地址
https://github.com/Bin-Huang/chatbox

# 官网地址
https://chatboxai.app/

使用Mermaid实现在前端绘制流程图

# 官网地址
https://mermaid.js.org/

# Github地址
https://github.com/mermaid-js/mermaid

2 安装Xinference

2.1 Docker安装Xinference

⚠️ Xinference默认使用的是HuggingFace上的镜像文件，可以通过”XINFERENCE_MODEL_SRC“参数设置国内的ModelScope。

# 参数说明
# XINFERENCE_MODEL_SRC：设置镜像源；
# XINFERENCE_HOME：设置Xinference的根目录;
docker run -itd \
--name=xinference \
-p 9997:9997 \
-e XINFERENCE_MODEL_SRC=modelscope \
-e XINFERENCE_HOME=/data \
-v /home/xinference/data:/data \
--gpus all \
xprobe/xinference:v0.15.4 xinference-local -H 0.0.0.0 --log-level debug

访问地址

# UI地址
http://192.168.137.64:9997/

⚠️ **Docker使用GPUS的错误:**提示docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].错误解决方法，是由于Docker缺少依赖包导致，我的系统是Cent)S7.

# 设置yum源头
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo

# 下载依赖包
yum install nvidia-container-runtime

# 重启docker
systemctl restart docker

2.2 配置模型

下面使用的页面配置，也可以根据自己的实际需求，构建自己的依赖环境，创建模型。

(1)使用页面

首页，从首页的上面导航可以看出有语言模型、嵌入模型等分类，可以根据自己的需求定制模型。

在这里插入图片描述

（2）模型配置

此处以Qwen2.5为例，可以配置Transformers、vLLM、SGLang等类型，同时可以选择不同的参数，最后点击下面的Launch（小火箭）。

在这里插入图片描述

（3）初始化完成

Qwen大模型

在这里插入图片描述

点击右侧的按钮，会跳出模型对话，方便实时对话。

在这里插入图片描述

BGE向量模型
在这里插入图片描述

2.3 调用模型

（1）调用Qwen模型

curl -X 'POST' \
  'http://127.0.0.1:9997/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen2.5-instruct",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "What is the largest animal?"
        }
    ]
  }'

在这里插入图片描述

（2）调用BGE向量模型

curl -X 'POST' \
  'http://127.0.0.1:9997/v1/embeddings' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "bge-base-en",
    "input": "What is the capital of China?"
  }'

在这里插入图片描述

Xinference支持的方式

对话生成：https://platform.openai.com/docs/api-reference/chat
生成: https://platform.openai.com/docs/api-reference/completions
向量生成：https://platform.openai.com/docs/api-reference/embeddings

3 安装Ollama

3.1 命令安装Ollama

# Ollama的网站
https://ollama.com/

# 安装方法 
https://github.com/ollama/ollama/blob/main/docs/linux.md

# 自动安装下载脚本文件，并执行
curl -fsSL https://ollama.com/install.sh | sh

# 方法1：在官网上手动安装Ollama（推荐）
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
# 执行命令
sudo tar -C /usr -xzvf ollama-linux-amd64.tgz


# 方法2：在GitHub上手动下载
https://github.com/ollama/ollama/releases
# 找到相应的版本后，选择相应额文件，例如：我选择的是下面文件
https://github.com/ollama/ollama/releases/download/v0.3.12/ollama-linux-amd64.tgz
# 下载完成后，执行下面的命令，会显示执行过程
sudo tar -C /usr -xzvf ollama-linux-amd64.tgz

# 执行完后，启用新的终端，执行下面命令，否则会出现“/usr/local/bin/ollama: No such file or directory”错误
# 直接启动服务
ollama serve
# 后台执行的命令，注意目录
nohup ollama serve > ollama.log 2>&1 &

# 查看版本号
ollama -v

# 显示模型列表
ollama list

# 使用帮助
ollama -h

3.2 在线初始化模型

# 使用Ollama初始化向量模型
# 注意：可根据自己的实际需求选择高版本
# 下载all-minilm:l6-v2向量模型
# 地址：https://ollama.com/library/all-minilm:l6-v2
ollama pull all-minilm:l6-v2


# 下载Qwen2.5文本模型
# 地址；https://ollama.com/library/qwen2.5:1.5b-instruct
ollama pull qwen2.5:1.5b-instruct

3.3 离线初始化模型

1 找到后缀GGUF

后缀GGUF类型的模型文件是一种以二进制格式存储数据参数的预训练模型，可支持在CPU上推理的量化模型。

# 在ModelScope和HuggingFace上找到相应的文件，然后下载下来
Qwen2.5-1.5B-Instruct-GGUF

2 配置Modelfile

模型目录

--Modelfile
--Qwen2.5-1.5B-Instruct-GGUF

Modelfile文件的内容，只有1行命令

FROM ./Qwen2.5-1.5B-Instruct-GGUF

3 创建模型

# 创建模型，模型名称qwen2.5-1.5b-instruct可以自定义
ollama create qwen2.5:1.5b-instruct -f ./Modelfile

# 运行模型
ollama run qwen2.5:1.5b-instruct

3.4 调用模型

# 测试Qwen模型
curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5:1.5b-instruct",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

# 测试向量模型
curl http://localhost:11434/api/embeddings -d '{
  "model": "all-minilm:l6-v2",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

4 安装LocalAI

4.1 Docker安装LocalAI

使用docker安装LocalAI，注意LocalAI的镜像很大，因为默认安装了几个模型。

# 注意主机的CUDA版本号
docker run -itd \
--name local-ai \
-p 8080:8080 \
-v /home/localai/models:/build/models \
--gpus all \
localai/localai:v2.22.0-aio-gpu-nvidia-cuda-12


# 注意可以选择cpu版本
docker run -itd \
--name local-ai \
-p 8080:8080 \
localai/localai:v2.22.0-aio-cpu

访问地址

# 网页地址
http://192.168.137.64:8080/

# 浏览模型的地址，可能会很慢，跟网速有关
http://192.168.137.64:8080/browse/

4.2 配置模型

系统首页，系统默认已经安装了gpt-4、gpt-4o、jina-reranker-v1-base-en、stablediffusion、text-embedding-ada-002、tts-1、whisper-1等多个模型，由于网络限制问题，不能直接使用。

在这里插入图片描述

⚠️ **问题1:**LocalAI使用的是HuggingFace上的镜像文件，国内无法直接下载，可以使用下面的方法解决.

下载GGUF或者bin镜像文件

HuggingFace上的"https://huggingface.co/bartowski"地址上有很多GGUF文件，可以直接使用。

注意：不同模型的地址不太一样，可根据自己的需求在系统上找到镜像，然后点击Install，如果安装失败会提示下载失败和模型路径。

# 在下面HuggingFace仓库中下载需要的GGUF文件
# 由于"https://huggingface.co/bartowski"不能直接访问,使用下面的地址访问.
https://hf-mirror.com/bartowski

上传到容器共享的目录下，上传到目录后，可以在Home菜单下直接看到，也可以在chat菜单下选择模型后使用。

# 我创建的容器共享目录如下
/home/localai/models

上传后的目录
在这里插入图片描述

系统上可以直接看到模型

在这里插入图片描述

4.3 调用模型

在这里插入图片描述

使用命令访问

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen2.5-0.5B-Instruct-Q4_K_M.gguf",
  "messages": [{"role": "user", "content": "Say this is a test!"}],
  "temperature": 0.7
}'

在这里插入图片描述

5 安装graphrag

⚠️ Python版本：3.10.6；graphrag版本：0.3.6

5.1 安装依赖

# 创建目录，虚拟环境和应用全部放在graphrag
mkdir graphrag

# 注意Python环境Python3.10-3.12
# 创建虚拟环境
python -m venv pygraphrag

# 激活虚拟环境
source pygraphrag/activate

# 安装graphrag
pip install graphrag -i https://pypi.tuna.tsinghua.edu.cn/simple

5.2 初始化GraphRAG

# 创建ragtest
mkdir /home/ragtest

# 下载数据
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > /home/ragtest/input/book.txt

# 使用命令初始化配置
# ragtest初始化文件的目录
python -m graphrag.index --init --root ./ragtest

# 初始化后会生成文件
# GraphRAG设置的提示模板，可根据实际需求修改
--prompts
----claim_extraction.txt
----community_report.txt
----entity_extraction.txt
----summarize_descriptions.txt
# 存储第三方密钥文件，例如：ChatGPT、Qwen等密钥
--.env
# 配置文件
--settings.yaml

5.3 初始化配置参数

修改模型参数“settings.yaml”

encoding_model: cl100k_base
skip_workflows: []
llm:
  # 0 配置基础
  api_key: ollama
  type: openai_chat # or azure_openai_chat
  model: qwen2.5:1.5b-instruct
  model_supports_json: true # recommended if this is available for your model.
  # 1 修改tokens数量
  max_tokens: 4000
  # request_timeout: 180.0
  # 2 修改地址
  api_base: http://127.0.0.1:11434/v1/

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  # target: required # or all
  # batch_size: 16 # the number of documents to send in a single request
  # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
  llm:
    # 0 配置基础
    api_key: Xinference
    type: openai_embedding # or azure_openai_embedding
    model: bge-base-en
    api_base: http://127.0.0.1:9997/v1/

5.4 构建索引

# 构建索引
python -m graphrag.index --root ./ragtest

⚠️ 我没构建成功一直报错，只能通过修改代码的方式实现了。

错误如下：我没有解决，日志文件在output中的indexing-engine.log。

datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):

在这里插入图片描述