LLMs之RAG：MemoRAG(利用其记忆模型来实现对整个数据库的全局理解)的简介、安装和使用方法、案例应用之详细攻略

MemoRAG的简介

0、更新日志

1、特性

2、路线图

MemoRAG的安装和使用方法

1、安装

安装依赖项

T1、从源码安装

T2、通过pip安装

2、使用方法

MemoRAG的Lite模式

MemoRAG的基本使用

使用长LLM作为记忆模型

摘要任务

使用API作为生成器

支持的生成器API

记忆模型的使用

记忆增强检索的使用

基准评估

评估

数据集

3、MemoRAG演示

T1、脚本演示

T2、在Google Colab上免费试用MemoRAG

MemoRAG的案例应用

MemoRAG的简介

MemoRAG：通过记忆启发的知识发现迈向下一代RAG为RAG赋予基于记忆的数据接口，适用于各种用途的应用！

MemoRAG是一个建立在高效、超长记忆模型之上的创新RAG框架。与主要处理具有明确信息需求查询的标准RAG不同，MemoRAG利用其记忆模型来实现对整个数据库的全局理解。通过从记忆中回忆特定于查询的线索，MemoRAG增强了证据检索，从而生成更准确且上下文丰富的响应。

GitHub地址：GitHub - qhjqhj00/MemoRAG: Empowering RAG with a memory-based data interface for all-purpose applications!

0、更新日志

[21/09/24] MemoRAG引入了Lite模式，只需几行代码即可启用针对数百万个token的记忆增强型RAG处理。更多细节请参阅示例笔记本。
[13/09/24] MemoRAG增加了Meta-Llama-3.1-8B-Instruct和Llama3.1-8B-Chinese-Chat作为记忆模型，请参见示例。
[10/09/24] 我们发布了MemoRAG的技术报告。
[09/09/24] 您可以在Google Colab上免费试用MemoRAG。
[05/09/24] 可以在TommyChien/memorag-qwen2-7b-inst获取基于Qwen2的记忆模型。
[03/09/24] 可以在TommyChien/memorag-mistral-7b-inst获取基于Mistral的记忆模型。
[01/09/24] 项目启动！

1、特性

>> 全局记忆：单个上下文中处理多达一百万个token，提供对大量数据集的全面理解。
>> 优化&灵活：轻松适应新任务，仅需几个小时的额外训练即可达到优化性能。
>> 上下文线索：从全局记忆生成精确线索，将原始输入与答案联系起来，并解锁复杂数据中的隐藏见解。
>> 高效缓存：加快上下文预填充速度达30倍，支持缓存分块、索引和编码。
>> 上下文重用：一次编码长上下文并支持重复使用，在需要反复访问数据的任务中提高效率。

2、路线图

MemoRAG目前正处于积极开发阶段，资源和原型会持续发布在这个仓库中。
代码 / 模型 / 数据集发布
支持OpenAI/Azure模型
技术报告发布
支持中文
演示代码发布
记忆模型训练代码发布
轻量化优化
加速推理
集成任何检索方法
丰富记忆能力
注意：MemoRAG近期的目标是通过工程改进实现轻量化优化，并增强其记忆能力，使其能够适应更广泛的应用并支持更长的上下文（例如超过一百万个token）。

MemoRAG的安装和使用方法

1、安装

要使用Memorizer和MemoRAG，您需要安装Python以及所需的库。可以使用以下命令安装必要的依赖项：

安装依赖项

pip install torch==2.3.1
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

T1、从源码安装

首先克隆此仓库

cd MemoRAG
pip install -e .

T2、通过pip安装

pip install memorag

对于快速开始，我们提供了一个笔记本，用于说明MemoRAG的所有功能。

2、使用方法

MemoRAG的Lite模式

我们介绍了MemoRAG的Lite模式，旨在为MemoRAG管道提供快捷友好的体验。只需几行代码，您就可以轻松尝试MemoRAG。虽然建议从拥有24GiB内存的GPU开始，但在大多数情况下，默认设置下16GiB GPU也能处理该管道。

from memorag import MemoRAGLite
pipe = MemoRAGLite()
context = open("examples/harry_potter.txt").read()
pipe.memorize(context, save_dir="harry_potter", print_stats=True)

query = "What's the book's main theme?"
print(pipe(query))

MemoRAG Lite易于使用，支持多达数百万token的英文或中文上下文。虽然它可能与其他语言一起工作，但性能可能会下降，因为默认提示是英文的。关于MemoRAG Lite的更多详情，请参阅示例笔记本。

MemoRAG的基本使用

MemoRAG易于使用，可以直接使用HuggingFace模型初始化。通过使用MemoRAG.memorize()方法，记忆模型会在较长的输入上下文中构建全局记忆。根据经验，使用默认参数设置时，TommyChien/memorag-qwen2-7b-inst可以处理高达400K token的上下文，而TommyChien/memorag-mistral-7b-inst可以管理高达128K token的上下文。通过增加beacon_ratio参数，可以扩展模型处理更长上下文的能力。例如，当beacon_ratio=16时，TommyChien/memorag-qwen2-7b-inst可以处理高达一百万个token。

from memorag import MemoRAG

# Initialize MemoRAG pipeline
pipe = MemoRAG(
    mem_model_name_or_path="TommyChien/memorag-mistral-7b-inst",
    ret_model_name_or_path="BAAI/bge-m3", 
    gen_model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2", # Optional: if not specify, use memery model as the generator
    cache_dir="path_to_model_cache",  # Optional: specify local model cache directory
    access_token="hugging_face_access_token",  # Optional: Hugging Face access token
    beacon_ratio=4
)

context = open("examples/harry_potter.txt").read()
query = "How many times is the Chamber of Secrets opened in the book?"

# Memorize the context and save to cache
pipe.memorize(context, save_dir="cache/harry_potter/", print_stats=True)

# Generate response using the memorized context
res = pipe(context=context, query=query, task_type="memorag", max_new_tokens=256)
print(f"MemoRAG generated answer: \n{res}")

运行上述代码时，编码后的键值(KV)缓存、Faiss索引和分块段落将存储在指定的save_dir中。之后，如果再次使用相同的上下文，则可以从磁盘快速加载数据：

pipe.load("cache/harry_potter/", print_stats=True)

通常，加载缓存权重是非常高效的。例如，使用TommyChien/memorag-qwen2-7b-inst作为记忆模型时，编码、分块和索引200K-token上下文大约需要35秒，但从缓存文件加载只需要1.5秒。

使用长LLM作为记忆模型

最近的LLM由于它们不断扩大的上下文窗口，已经成为了有效的记忆模型。MemoRAG现在支持利用这些长上下文LLM作为记忆模型，并利用MInference优化上下文预填充。我们测试了Meta-Llama-3.1-8B-Instruct和Llama3.1-8B-Chinese-Chat作为记忆模型，两者都原生支持128K的上下文长度。我们目前正在探索其他合适的LLM并优化策略以进一步增强记忆机制和上下文长度。有关详细使用说明，请参阅提供的脚本和笔记本：

from memorag import MemoRAG
model = MemoRAG(
    mem_model_name_or_path="shenzhi-wang/Llama3.1-8B-Chinese-Chat",    # For Chinese
    # mem_model_name_or_path="meta-llama/Meta-Llama-3.1-8B-Instruct",  # For English
    ret_model_name_or_path="BAAI/bge-m3",
    # cache_dir="path_to_model_cache",  # to specify local model cache directory (optional)
    # access_token="hugging_face_access_token"  # to specify local model cache directory (optional)
    )

之后，您可以像往常一样使用MemoRAG的功能。

摘要任务

要执行摘要任务，请使用以下脚本：

res = pipe(context=context, task_type="summarize", max_new_tokens=512)
print(f"MemoRAG summary of the full book:\n {res}")

使用API作为生成器

如果您想使用API作为生成器，请参考下面的脚本：

from memorag import Agent, MemoRAG

# API configuration
api_dict = {
    "endpoint": "",
    "api_version": "2024-02-15-preview",
    "api_key": ""
}
model = "gpt-35-turbo-16k"
source = "azure"

# Initialize Agent with the API
agent = Agent(model, source, api_dict)
print(agent.generate("hi!"))  # Test the API

# Initialize MemoRAG pipeline with a customized generator model
pipe = MemoRAG(
    mem_model_name_or_path="TommyChien/memorag-qwen2-7b-inst",
    ret_model_name_or_path="BAAI/bge-m3",
    cache_dir="path_to_model_cache",  # Optional: specify local model cache directory
    customized_gen_model=agent,
)

# Load previously cached context
pipe.load("cache/harry_potter_qwen/", print_stats=True)

# Use the loaded context for question answering
query = "How are the mutual relationships between the main characters?"
context = open("harry_potter.txt").read()

res = pipe(context=context, query=query, task_type="memorag", max_new_tokens=256)
print(f"MemoRAG with GPT-3.5 generated answer: \n{res}")

支持的生成器API

内置的Agent对象支持来自openai和deepseek的模型。以下是初始化这些模型的配置：

# Using deepseek models
model = ""
source = "deepseek"
api_dict = {
    "base_url": "",
    "api_key": ""
}

# Using openai models
model = ""
source = "openai"
api_dict = {
    "api_key": ""
}

记忆模型的使用

记忆模型可以独立使用来存储、回忆和交互上下文。这里有一个例子：

from memorag import Agent, MemoRAG

# API configuration
api_dict = {
    "endpoint": "",
    "api_version": "2024-02-15-preview",
    "api_key": ""
}
model = "gpt-35-turbo-16k"
source = "azure"

# Initialize Agent with the API
agent = Agent(model, source, api_dict)
print(agent.generate("hi!"))  # Test the API

# Initialize MemoRAG pipeline with a customized generator model
pipe = MemoRAG(
    mem_model_name_or_path="TommyChien/memorag-qwen2-7b-inst",
    ret_model_name_or_path="BAAI/bge-m3",
    cache_dir="path_to_model_cache",  # Optional: specify local model cache directory
    customized_gen_model=agent,
)

# Load previously cached context
pipe.load("cache/harry_potter_qwen/", print_stats=True)

# Use the loaded context for question answering
query = "How are the mutual relationships between the main characters?"
context = open("harry_potter.txt").read()

res = pipe(context=context, query=query, task_type="memorag", max_new_tokens=256)
print(f"MemoRAG with GPT-3.5 generated answer: \n{res}")

记忆增强检索的使用

除了独立的记忆模型之外，MemoRAG还提供了记忆增强检索功能。这允许基于从记忆中回忆起的线索来改善证据检索。

from memorag import MemoRAG

# Initialize MemoRAG pipeline
pipe = MemoRAG(
    mem_model_name_or_path="TommyChien/memorag-qwen2-7b-inst",
    ret_model_name_or_path="BAAI/bge-m3",
    cache_dir="path_to_model_cache",  # Optional: specify local model cache directory
    access_token="hugging_face_access_token"  # Optional: Hugging Face access token
)

# Load and memorize the context
test_txt = open("harry_potter.txt").read()
pipe.memorize(test_txt, save_dir="cache/harry_potter/", print_stats=True)

# Define the query
query = "How are the mutual relationships between the main characters?"

# Recall clues from memory
clues = pipe.mem_model.recall(query).split("\n")
clues = [q for q in clues if len(q.split()) > 3]  # Filter out short or irrelevant clues
print("Clues generated from memory:\n", clues)

# Retrieve relevant passages based on the recalled clues
retrieved_passages = pipe._retrieve(clues)
print("\n======\n".join(retrieved_passages[:3]))

基准评估

以下是结合三种生成模型对记忆模型进行实验的结果。我们在三个基准上测试了MemoRAG。每个区块的最佳结果已加粗显示。

Dataset	NarrativeQA	Qasper	MultifieldQA	Musique	2Wiki	HotpotQA	MultiNews	GovReport	En.sum	En.qa	Fin	Legal	Mix
	LongBench								InfBench		UltraDomain
Generator: Llama3-8B-Instruct-8K
Full	21.3	43.4	46.6	23.5	38.2	47.1	24.6	23.6	13.1	6.7	34.2	33.2	42.7
BGE-M3	22.1	44.3	50.2	22.2	36.7	48.4	22.1	20.1	12.1	15.1	41.4	40.6	46.4
Stella-v5	12.3	35.2	44.4	22.1	33.3	41.9	22.1	20.7	11.7	14.8	41.9	33.7	44.9
RQ-RAG	20.2	43.9	49.1	22.7	36.1	44.5	20.6	21.0	12.0	13.3	39.5	36.8	44.5
HyDE	22.1	44.3	50.2	22.2	36.7	48.4	-	-	-	19.1	41.4	40.6	46.4
MemoRAG	22.8	45.7	50.7	28.4	51.4	57.0	27.4	27.9	14.1	16.1	47.8	47.9	55.5
Generator: Phi-3-mini-128K
Full	21.4	35.0	47.3	19.0	35.5	42.1	25.6	23.7	13.0	15.2	44.8	40.5	44.7
BGE-M3	20.3	33.0	44.3	21.1	35.4	42.1	17.7	19.8	9.6	16.3	41.7	41.2	43.7
Stella-v5	13.7	32.4	43.5	21.0	35.6	40.6	20.3	18.2	10.0	19.5	42.8	35.1	43.9
RQ-RAG	19.6	34.1	46.5	21.9	36.1	41.7	20.1	18.6	10.4	16.1	41.8	40.9	43.2
HyDE	18.7	36.0	47.5	20.5	36.8	42.7	-	-	-	19.6	43.1	41.6	44.2
MemoRAG	27.5	43.9	52.2	33.9	54.1	54.8	32.9	26.3	15.7	22.9	51.5	51.0	55.6
Generator: Mistral-7B-Instruct-v0.2-32K
Full	20.8	29.2	46.3	18.9	20.6	37.6	23.0	20.4	12.4	12.3	36.5	35.8	42.1
BGE-M3	17.3	29.5	46.3	18.5	20.3	36.2	24.3	26.1	13.5	12.2	40.5	42.0	41.1
Stella-v5	13.5	23.7	42.1	18.6	22.2	31.9	21.1	18.5	13.2	9.7	40.9	34.9	42.1
RQ-RAG	17.1	29.2	47.0	19.1	21.5	37.0	22.1	18.6	13.1	12.7	44.3	44.6	43.4
HyDE	17.4	29.5	46.3	18.5	20.1	36.2	-	-	-	12.2	42.8	35.1	43.9
MemoRAG	23.1	31.2	50.0	26.9	30.3	42.9	27.1	31.6	17.9	15.4	48.0	51.2	53.6
MemoRAG-qwen2	22.2	32.7	49.6	31.4	33.7	44.4	27.0	31.5	16.8	17.6	48.7	52.3	48.6