使用Spring AI 、 Qdrant 和 Ollama 实现完全本私有化的RAG应用
迄今为止,Python 一直是实现检索增强生成(RAG)应用程序的首选语言,几乎成为开发大型语言模型(LLM)应用程序的默认选择。然而,对于 Java 的爱好者和倡导者来说,这一趋势并不意味着终结。恰恰相反,这是一种创新的机会。在这篇文章中,我们将探讨如何创建一个可扩展的、本地化的 RAG 应用程序,以处理复杂的文档。这将通过整合 #springboot 的稳健性、#qdrant 的高效性和 #ollama 的智能性来实现。
介绍
图片中所描绘的架构代表了一种处理和分析复杂文档(如调研报告、财务报告等)的复杂方法。用户首先通过一个称为 /load 的 API 上传文档,然后使用另一个称为 /ask 的 API 向系统提问。这表明这是一个交互式系统,初始动作是文档上传,随后是查询过程,使用户能够从上传的文档中提取有意义的信息。
此架构的核心是 “Spring AI”,当用户上传文档时,Spring AI 接收解析并分析文本。它将复杂文档的内容转化为一种结构化形式,使其适合于高级数据处理技术。Spring AI 的精髓在于其能够细致地理解和消化这些文档的内容,为处理过程的下一阶段做准备。
在 Spring AI 的初步处理之后,我们进入数据处理和存储的领域。在这里,处理后的数据被转换为向量,即捕捉文档语义本质的数值表示。这个转换至关重要,因为它使系统能够执行复杂的推理,这一任务由 Ollama 的组件管理。AI 驱动的 Llama3 模型随后使用这些嵌入来理解和解释内容,从而对用户查询做出智能响应。最后,这些嵌入被存储在专为文档设计的向量存储 Qdrant 中。通过将信息存储为向量,系统确保能够高效地检索和分析数据,从而快速、智能地响应用户查询。这种存储解决方案是系统能够处理重复交互的基础,每次查询都利用存储的嵌入来提供精确和上下文相关的见解。
通过这种三位一体的架构,系统提供了一种从文档上传到信息检索的无缝体验,由处理和理解复杂调研数据的高级 AI Llama3 所支撑。
实现
项目脚手架如下:
下面是代码详情
pom.xml:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.2.5</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>org.liugddx</groupId>
<artifactId>springboot-rag</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>springboot-rag</name>
<description>springboot-rag</description>
<properties>
<java.version>21</java.version>
<spring-ai.version>0.8.1</spring-ai.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-qdrant-store-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>spring-snapshots</id>
<name>Spring Snapshots</name>
<url>https://repo.spring.io/snapshot</url>
<releases>
<enabled>false</enabled>
</releases>
</repository>
</repositories>
</project>
pom中是spring ai的主要依赖,然后就该创建配置文件application.yaml:
这里最重要的是我们已经配置了从LLM到本地向量存储的所有内容,这为我们提供了最高的隐私和安全性,但这可能始终无法满足我们的要求。因此,只需更改此文件中的托管 URL,其余代码即可正常工作。
spring:
application:
name: springboot-rag
threads:
virtual:
enabled: true
ai:
ollama:
base-url: "http://localhost:11434"
embedding:
model: "mxbai-embed-large:latest"
chat:
model: "llama3:latest"
options:
temperature: 0.3
top-k: 2
top-p: 0.2
num-g-p-u: 1 # enable Metal gpu on MAC
vectorstore:
qdrant:
host: localhost
port: 6334
collection-name: infoq-report
我们将从 ravendb 下载一份关于 2023-2024 年 NoSql 趋势的复杂报告。
现在让我们将系统提示文件和数据分别保存在我们项目的特定文件夹prompts/system.pmt
和data/InfoQ-Trend-Report.pdf
中。
You're fielding inquiries related to a technology trends report provided by InfoQ.
The report is a compilation of the most sought-after InfoQ Trends Reports from 2023,
encompassing a range of topics like technological advancements, software development tendencies, and organizational best practices.
It delves into diverse themes such as the dynamics of hybrid work environments, comparative analysis of architectural patterns including monoliths, microservices, and moduliths, the role of platform engineering, the evolution of large language models (LLMs), and the latest developments in the Java domain.
Leverage the insights from the DOCUMENTS section to inform your responses, drawing on the information as if it were your own knowledge base. If the answer isn't clear, it's best to acknowledge the gap in information.
DOCUMENTS:
{documents}
让我们看一下两个主要接口及其实现。
DataIndexer.java
package org.liugddx.springbootrag.service;
public interface DataIndexer {
void loadData();
long count();
}
DataindexerServiceImpl.java:
package org.liugddx.springbootrag.service.impl;
import org.liugddx.springbootrag.service.DataIndexer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.DocumentReader;
import org.springframework.ai.reader.ExtractedTextFormatter;
import org.springframework.ai.reader.JsonReader;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.reader.pdf.config.PdfDocumentReaderConfig;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
@Service
public class DataIndexerServiceImpl implements DataIndexer {
@Value("classpath:/data/InfoQ-NoSql-trend-Report.pdf")
private Resource documentResource;
private final VectorStore vectorStore;
Logger logger = LoggerFactory.getLogger(DataIndexerServiceImpl.class);
public DataIndexerServiceImpl(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
@Override
public void loadData() {
DocumentReader documentReader = null;
if(this.documentResource.getFilename() != null && this.documentResource.getFilename().endsWith(".pdf")){
this.logger.info("Loading PDF document");
documentReader = new PagePdfDocumentReader(this.documentResource,
PdfDocumentReaderConfig.builder()
.withPageExtractedTextFormatter(ExtractedTextFormatter.builder()
.withNumberOfBottomTextLinesToDelete(3)
.withNumberOfTopPagesToSkipBeforeDelete(3)
.build())
.withPagesPerDocument(3)
.build());
}else if (this.documentResource.getFilename() != null && this.documentResource.getFilename().endsWith(".txt")) {
documentReader = new TextReader(this.documentResource);
} else if (this.documentResource.getFilename() != null && this.documentResource.getFilename().endsWith(".json")) {
documentReader = new JsonReader(this.documentResource);
}
if(documentReader != null){
var textSplitter = new TokenTextSplitter();
this.logger.info("Loading text document to qdrant vector database");
this.vectorStore.accept(textSplitter.apply(documentReader.get()));
this.logger.info("Loaded text document to qdrant vector database");
}
}
@Override
public long count() {
return this.vectorStore.similaritySearch("*").size();
}
}
RAGService.java:
package org.liugddx.springbootrag.service;
public interface RAGService {
String findAnswer(String query);
}
RAGServiceImpl.java:
package org.liugddx.springbootrag.service.impl;
import org.liugddx.springbootrag.service.RAGService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.ChatResponse;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
@Service
public class RAGServiceImpl implements RAGService {
private final Logger logger = LoggerFactory.getLogger(this.getClass());
@Value("classpath:/prompts/system.pmt")
private Resource systemPromptResource;
private final VectorStore vectorStore;
private final ChatClient aiClient;
public RAGServiceImpl(VectorStore vectorStore, ChatClient aiClient) {
this.vectorStore = vectorStore;
this.aiClient = aiClient;
}
@Override
public String findAnswer(String query) {
// Combine system message retrieval and AI model call into a single operation
ChatResponse aiResponse = aiClient.call(new Prompt(List.of(
getRelevantDocs(query),
new UserMessage(query))));
// Log only necessary information, and use efficient string formatting
logger.info("Asked AI model and received response.");
return aiResponse.getResult().getOutput().getContent();
}
private Message getRelevantDocs(String query) {
List<Document> similarDocuments = vectorStore.similaritySearch(query);
// Log the document count efficiently
if (logger.isInfoEnabled()) {
logger.info("Found {} relevant documents.", similarDocuments.size());
}
// Streamline document content retrieval
String documents = similarDocuments.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n"));
SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(this.systemPromptResource);
return systemPromptTemplate.createMessage(Map.of("documents", documents));
}
}
然后就是控制器,主要包将报告加载到向量存储到回答用户问题的特定操作。
DataIndexController.java:
package org.liugddx.springbootrag.controller;
import org.liugddx.springbootrag.service.DataIndexer;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping("/api/v1")
public class DataIndexController {
private final DataIndexer dataIndexer;
public DataIndexController(DataIndexer dataIndexer) {
this.dataIndexer = dataIndexer;
}
@PostMapping("/data/load")
public ResponseEntity<String> load() {
try {
this.dataIndexer.loadData();
return ResponseEntity.ok("Data indexed successfully!");
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body("An error occurred while indexing data: " + e.getMessage());
}
}
@GetMapping("/data/count")
public long count() {
return dataIndexer.count();
}
}
RAGController.java:
package org.liugddx.springbootrag.controller;
import org.liugddx.springbootrag.service.RAGService;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.util.LinkedHashMap;
import java.util.Map;
@RestController
@RequestMapping("/api/v1")
public class RAGController {
private final RAGService ragService;
public RAGController(RAGService ragService) {
this.ragService = ragService;
}
@GetMapping("/ask")
public Map findAnswer(@RequestParam(value = "question", defaultValue = "give me the general summary on the trend report") String question) {
String answer = this.ragService.findAnswer(question);
Map map = new LinkedHashMap();
map.put("question", question);
map.put("answer", answer);
return map;
}
}
就这样,现在是我们运行代码库的时候了。一旦一切正常,我们应该看到服务器输出以及服务已启动的端口,如下所示。
总结
总之,借助 Spring AI 及其与 Ollama 和 Qdrant 的集成,我们实现了一个简单的全本地的文档问答RAG系统。Spring AI 作为我们检索回答生成器(RAG)的智能基石,提供了一种细腻的方法来消化和解释复杂的调研报告。它与 Ollama 的无缝协作,不仅能检索答案,还能通过上下文感知的响应来生成理解,使 RAG 成为一个对话助手,而不仅仅是一个简单的查询工具。Qdrant 通过提供优化的向量存储,进一步提升了这一过程,支持 RAG 以快速和准确的方式获取最相关的文档部分。
代码地址:git@github.com:liugddx/springbot-rag.git