HippoRAG 2 原理精读

news2025/3/12 22:04:54

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档

文章目录

    • 整体流程
      • 离线索引阶段
      • 在线检索和问答阶段
    • 总结

整体流程

在这里插入图片描述

从上图可以看出,整个流程分为两个阶段

1、离线索引阶段

2、在线检索和问答阶段

离线索引阶段

1、对文档进行分段
2、抽取实体

对应的prompt

ner_system = """Your task is to extract named entities from the given paragraph. 
Respond with a JSON list of entities.
"""

one_shot_ner_paragraph = """Radio City
Radio City is India's first private FM radio station and was started on 3 July 2001.
It plays Hindi, English and regional songs.
Radio City recently forayed into New Media in May 2008 with the launch of a music portal - PlanetRadiocity.com that offers music related news, videos, songs, and other music-related features."""


one_shot_ner_output = """{"named_entities":
    ["Radio City", "India", "3 July 2001", "Hindi", "English", "May 2008", "PlanetRadiocity.com"]
}
"""


prompt_template = [
    {"role": "system", "content": ner_system},
    {"role": "user", "content": one_shot_ner_paragraph},
    {"role": "assistant", "content": one_shot_ner_output},
    {"role": "user", "content": "${passage}"}
]

LLM的输出如

{
  "named_entities": [
    "迦楼罗",
    "大鹏金翅鸟",
    "岳飞"
  ]
}

3、根据实体和文本生成三元组

from .ner import one_shot_ner_paragraph, one_shot_ner_output
from ...utils.llm_utils import convert_format_to_template

ner_conditioned_re_system = """Your task is to construct an RDF (Resource Description Framework) graph from the given passages and named entity lists. 
Respond with a JSON list of triples, with each triple representing a relationship in the RDF graph. 

Pay attention to the following requirements:
- Each triple should contain at least one, but preferably two, of the named entities in the list for each passage.
- Clearly resolve pronouns to their specific names to maintain clarity.

"""


ner_conditioned_re_frame = """Convert the paragraph into a JSON dict, it has a named entity list and a triple list.
Paragraph:

{passage}


{named_entity_json}
"""


ner_conditioned_re_input = ner_conditioned_re_frame.format(passage=one_shot_ner_paragraph, named_entity_json=one_shot_ner_output)


ner_conditioned_re_output = """{"triples": [
            ["Radio City", "located in", "India"],
            ["Radio City", "is", "private FM radio station"],
            ["Radio City", "started on", "3 July 2001"],
            ["Radio City", "plays songs in", "Hindi"],
            ["Radio City", "plays songs in", "English"],
            ["Radio City", "forayed into", "New Media"],
            ["Radio City", "launched", "PlanetRadiocity.com"],
            ["PlanetRadiocity.com", "launched in", "May 2008"],
            ["PlanetRadiocity.com", "is", "music portal"],
            ["PlanetRadiocity.com", "offers", "news"],
            ["PlanetRadiocity.com", "offers", "videos"],
            ["PlanetRadiocity.com", "offers", "songs"]
    ]
}
"""


prompt_template = [
    {"role": "system", "content": ner_conditioned_re_system},
    {"role": "user", "content": ner_conditioned_re_input},
    {"role": "assistant", "content": ner_conditioned_re_output},
    {"role": "user", "content": convert_format_to_template(original_string=ner_conditioned_re_frame, placeholder_mapping=None, static_values=None)}
]

这一步会根据上面抽取出来的实体、原始文本内容生成三元组。

格式如:[主语, 谓语, 宾语]

4、对原始文本、三元组、实体进行向量化

使用的是 nvidia/NV-Embed-v2 模型,参数量为7B

对不同类型的数据,使用的prompt不同,如下:

instructions = {
    'ner_to_node': 'Given a phrase, retrieve synonymous or relevant phrases that best match this phrase.',
    'query_to_node': 'Given a question, retrieve relevant phrases that are mentioned in this question.',
    'query_to_fact': 'Given a question, retrieve relevant triplet facts that matches this question.',
    'query_to_sentence': 'Given a question, retrieve relevant sentences that best answer the question.',
    'query_to_passage': 'Given a question, retrieve relevant documents that best answer the question.',
}

5、增加同义词边

连接语义相近的实体

在线检索和问答阶段

1、检索topn 的三元组,用LLM对检索结果过滤无关三元组
* 如果没有检索到三元组,则直接检索相关文档
* 如果检索到三元组,则从构建的图中基于个性化的pagerank算法,检索出topn的相关文档

2、利用LLM基于检索结果回答

对应prompt

one_shot_ircot_demo_docs = (
    """Wikipedia Title: Milk and Honey (album)\nMilk and Honey is an album by John Lennon and Yoko Ono released in 1984. Following the compilation "The John Lennon Collection", it is Lennon's eighth and final studio album, and the first posthumous release of new Lennon music, having been recorded in the last months of his life during and following the sessions for their 1980 album "Double Fantasy". It was assembled by Yoko Ono in association with the Geffen label.\n\n"""
    """Wikipedia Title: John Lennon Museum\nJohn Lennon Museum (ジョン・レノン・ミュージアム , Jon Renon Myūjiamu ) was a museum located inside the Saitama Super Arena in Chūō-ku, Saitama, Saitama Prefecture, Japan. It was established to preserve knowledge of John Lennon's life and musical career. It displayed Lennon's widow Yoko Ono's collection of his memorabilia as well as other displays. The museum opened on October 9, 2000, the 60th anniversary of Lennon’s birth, and closed on September 30, 2010, when its exhibit contract with Yoko Ono expired. A tour of the museum began with a welcoming message and short film narrated by Yoko Ono (in Japanese with English headphones available), and ended at an avant-garde styled "reflection room" full of chairs facing a slide show of moving words and images. After this room there was a gift shop with John Lennon memorabilia available.\n\n"""
    """Wikipedia Title: Walls and Bridges\nWalls and Bridges is the fifth studio album by English musician John Lennon. It was issued by Apple Records on 26 September 1974 in the United States and on 4 October in the United Kingdom. Written, recorded and released during his 18-month separation from Yoko Ono, the album captured Lennon in the midst of his "Lost Weekend". "Walls and Bridges" was an American "Billboard" number-one album and featured two hit singles, "Whatever Gets You thru the Night" and "#9 Dream". The first of these was Lennon's first number-one hit in the United States as a solo artist, and his only chart-topping single in either the US or Britain during his lifetime.\n\n"""
    """Wikipedia Title: Nobody Loves You (When You're Down and Out)\n"Nobody Loves You (When You're Down and Out)" is a song written by John Lennon released on his 1974 album "Walls and Bridges". The song is included on the 1986 compilation "Menlove Ave.", the 1990 boxset "Lennon", the 1998 boxset "John Lennon Anthology", the 2005 two-disc compilation "", and the 2010 boxset "Gimme Some Truth".\n\n"""
    """Wikipedia Title: Give Peace a Chance\n"Give Peace a Chance" is an anti-war song written by John Lennon (credited to Lennon–McCartney), and performed with Yoko Ono in Montreal, Quebec, Canada. Released as a single in 1969 by the Plastic Ono Band on Apple Records (catalogue Apple 13 in the United Kingdom, Apple 1809 in the United States), it is the first solo single issued by Lennon, released when he was still a member of the Beatles, and became an anthem of the American anti-war movement during the 1970s. It peaked at number 14 on the "Billboard" Hot 100 and number 2 on the British singles chart.\n"""
)


one_shot_ircot_demo = (
    f'{one_shot_ircot_demo_docs}'
    '\n\nQuestion: '
    f"Nobody Loves You was written by John Lennon and released on what album that was issued by Apple Records, and was written, recorded, and released during his 18 month separation from Yoko Ono?"
    '\nThought: '
    f"The album issued by Apple Records, and written, recorded, and released during John Lennon's 18 month separation from Yoko Ono is Walls and Bridges. Nobody Loves You was written by John Lennon on Walls and Bridges album. So the answer is: Walls and Bridges."
    '\n\n'
)

ircot_system = (
    'You serve as an intelligent assistant, adept at facilitating users through complex, multi-hop reasoning across multiple documents. This task is illustrated through demonstrations, each consisting of a document set paired with a relevant question and its multi-hop reasoning thoughts. Your task is to generate one thought for current step, DON\'T generate the whole thoughts at once! If you reach what you believe to be the final step, start with "So the answer is:".'
    '\n\n'
    f'{one_shot_ircot_demo}'
)


prompt_template = [
    {"role": "system", "content": ircot_system},
    {"role": "user", "content": "${prompt_user}"}
]

总结

1、只是用三元组来协助检索,并没有利用图
2、难以相信这种做法能超越普通RAG几十个点

参考:

HippoRAG 2: From RAGtoMemory: Non-Parametric Continual Learning for Large Language Models

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2313948.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

三:FFMPEG拉流读取模块的讲解

FFMPEG拉流读取模块在远程监控项目最核心的作用是读取UVC摄像头传输的H264码流,并对其码流进行帧的提取,提取完成之后则把数据传输到VDEC解码模块进行解码。而在我们这个项目中,UVC推流的功能由FFMPEG的命令完成。 FFMPEG拉流读取模块的API…

《苍穹外卖》SpringBoot后端开发项目核心知识点与常见问题整理(DAY1 to DAY3)

目录 一、在本地部署并启动Nginx服务1. 解压Nginx压缩包2. 启动Nginx服务3. 验证Nginx是否启动成功: 二、导入接口文档1. 黑马程序员提供的YApi平台2. YApi Pro平台3. 推荐工具:Apifox 三、Swagger1. 常用注解1.1 Api与ApiModel1.2 ApiModelProperty与Ap…

QT系列教程(20) Qt 项目视图便捷类

视频连接 https://www.bilibili.com/video/BV1XY41127t3/?vd_source8be9e83424c2ed2c9b2a3ed1d01385e9 Qt项目视图便捷类 Qt项目视图提供了一些便捷类,包括QListWidget, QTableWidget, QTreeWidget等。我们分别介绍这几个便捷类。 我们先创建一个Qt …

动态扩缩容引发的JVM堆内存震荡:从原理到实践的GC调优指南

目录 一、典型案例:系统发布后的GC雪崩事件 (一)故障现象 1. 刚刚启动时 GC 次数较多 2. 堆内存锯齿状波动 3. GC日志特征:Allocation Failure (二)问题定位 二、原理深度解析:JVM内存弹…

AI智能眼镜主控芯片:技术演进与产业生态的深度解析

一、AI智能眼镜的技术挑战与主控芯片核心诉求 AI智能眼镜作为XR(扩展现实)技术的代表产品,其核心矛盾在于性能、功耗与体积的三角平衡。主控芯片作为设备的“大脑”,需在有限空间内实现复杂计算、多模态交互与全天候续航&#xf…

微服务拆分-远程调用

我们在查询购物车列表的时候,它有一个需求,就是不仅仅要查出购物车当中的这些商品信息,同时还要去查到购物车当中这些商品的最新的价格和状态信息,跟购物车当中的快照进行一个对比,从而去提醒用户。 现在我们已经做了服…

[网络爬虫] 动态网页抓取 — Selenium 介绍 环境配置

🌟想系统化学习爬虫技术?看看这个:[数据抓取] Python 网络爬虫 - 学习手册-CSDN博客 0x01:Selenium 工具介绍 Selenium 是一个开源的便携式自动化测试工具。它最初是为网站自动化测试而开发的,类似于我们玩游戏用的按…

【RAGFlow】windows本地pycharm运行

原因 由于官方只提供了docker部署,基于开源代码需要实现自己内部得逻辑,所以需要本地pycharm能访问,且docker运行依赖得其余组件,均需要使用开发服务器得配置。 修改过程 安装python 项目依赖于Python 版本:>3.1…

树莓派5首次开机保姆级教程(无显示器通过VNC连接树莓派桌面)

第一次开机详细步骤 步骤一:树莓派系统烧录1 搜索打开烧录软件“Raspberry Pi Imager”2 选择合适的设备、系统、SD卡3 烧录配置选项 步骤二:SSH远程树莓派1 树莓派插电2 网络连接(有线或无线)3 确定树莓派IP地址 步骤三&#xff…

html-表格标签

一、表格标签 1. 表格的主要作用 表格主要用于显示、展示数据,因为它可以让数据显示的非常的规整,可读性非常好。特别是后台展示数据 的时候,能够熟练运用表格就显得很重要。一个清爽简约的表格能够把繁杂的数据表现得很有条理。 总…

大模型安全新范式:DeepSeek一体机内容安全卫士发布

2月以来,DeepSeek一体机几乎成为了政企市场AI消费的最强热点。 通过一体机的方式能够缩短大模型部署周期,深度结合业务场景,降低中小企业对于大模型的使用门槛。据不完全统计,已约有超过60家企业基于DeepSeek推出一体机产品。 但…

数据分析绘制随时间顺序变化图加入线性趋势线——numpy库的polyfit计算一次多项式拟合

import pandas as pd import numpy as np import matplotlib.pyplot as plt# 导入数据 data pd.read_csv(rC:\Users\11712\notebooktrain1.csv)# 假设数据包含 date_time 和 speed 列 data[date_time] pd.to_datetime(data[date_time]) # 确保时间列是 datetime 类型 data.s…

密闭空间可燃气体监测终端:守护城市命脉,智驭燃气安全!

近年来,陕西省高度重视燃气安全,出台了一系列政策文件,旨在全面加强城镇燃气安全监管,防范化解重大安全风险。2023年,陕西省安委会印发《全省城镇燃气安全专项整治工作方案》,明确要求聚焦燃气经营、输送配…

阿里千问大模型(Qwen2.5-VL-7B-Instruct)部署

参考链接 知乎帖子 B站视频 huggingface 镜像网站(不太全,比如 Qwen/Qwen2.5-VL-7B-Instruct就没有) huggingface 5种下载方式汇总 通过huggingface-cli下载模型 不一样的部分是预训练权重的下载和demo 首先安装huggingface_hub pip insta…

【Go学习实战】03-3-文章评论及写文章

【Go学习实战】03-3-文章评论及写文章 文章评论注册valine获取凭证加载评论页面 写文章修改cdn位置完善功能查看页面 发布文章POST发布文章发布文章测试 查询文章详情查询详情测试 修改文章修改文章测试 写文章图片上传前端后端逻辑测试 文章评论 这里我们的博客因为是个轻量级…

从零开始用AI开发游戏(一)

1. 核心玩法设计 核心目标:玩家需在随机生成的3D迷宫中寻找出口,躲避陷阱、收集道具、解开谜题。核心机制: 随机生成迷宫:每次游戏生成不同结构的迷宫(递归分割算法或深度优先搜索)。第一人称视角&#xf…

AI-大模型中的流式输出与非流式输出

1.前言 在大模型API开发中,流式与非流式输出对应着两种不同的数据交互,在代码中stream中通过参数true与false来进行设定。 2.流式输出与非流式输出的原理 2.1.非流式输出-请求一次响应返回完整数据 非流式输出,传统的请求-响应模式&#xf…

【HarmonyOS Next】鸿蒙加固方案调研和分析

【HarmonyOS Next】鸿蒙加固方案调研和分析 一、前言 根据鸿蒙应用的上架流程,本地构建app文件后,上架到AGC平台,平台会进行解析。根据鸿蒙系统的特殊设置,仿照IOS的生态闭环方案。只能从AGC应用市场下载app进行安装。这样的流程…

蓝桥杯javaB组备战第二天 题目 区间次方和 编号3382

这是一个前缀和问题,但是不同于以为前缀和问题 前缀和问题求解思路: 创建一个前缀数组 s[] ,存储输入的元素的a[1]到a[n]的和 及:s[1] s[i-1]a[i] ,i>1 这样比暴力算法的复杂度要低很多可以将 时间复杂度从O(q*n*m)下降到 O(n*mq) …

《Android 平台架构系统启动流程详解》

目录 一、平台架构模块 1.1 Linux 内核 1.2 硬件抽象层 (HAL) 1.3 Android 运行时 1.4 原生 C/C 库 1.5 Java API 框架 1.6 系统应用 二、系统启动流程 2.1 Bootloader阶段 2.2 内核启动 2.3 Init进程(PID 1) 2.4 Zygote与System Serv…