GenAI 用于客户支持 — 第 4 部分：调整 RAG 搜索的相关性

作者：来自 Elastic Antonio Schönmann

欢迎阅读我们关于将生成式 AI 集成到 Elastic 客户支持的博客系列的第 4 部分。本期深入探讨了检索增强生成 (RAG) 在增强我们 AI 驱动的技术支持助理方面的作用。在这里，我们解决了改进搜索效果的挑战、解决方案和结果，并提供了行动项目，以使用 Elastic Stack 版本 8.11 中提供的工具集进一步提高其功能。

通过这些行动，我们实现了前 3 个结果相关性的约 75% 增长，并获得了超过 300,000 个 AI 生成的摘要，我们可以将其用于各种未来的应用程序。如果你是本系列的新手，请务必查看介绍核心技术和架构设置的早期帖子。如果你错过了本系列的最后一篇博客，你可以在这里找到它。

GenAI 用于客户支持 — 第 1 部分：构建我们的概念验证
GenAI 客户支持 - 第 2 部分：构建知识库
GenAI 客户支持 — 第 3 部分：为人类设计聊天机器人的聊天界面

搜索问题

完善 RAG（Retrieval-Augmented Generation - 检索增强生成）从根本上讲就是要击中搜索准确度的靶心🎯：

就像弓箭手小心翼翼地瞄准目标中心一样，我们希望专注于每次射击的准确性。
不仅如此，我们还希望确保我们拥有最佳目标，或者说是高质量的数据。

如果两者不兼顾，大型语言模型 (LLM) 可能会产生幻觉并产生误导性响应，这是潜在的风险。这样的错误肯定会动摇用户对我们系统的信任，导致使用偏离和投资回报率低下。

为了避免这些负面影响，我们遇到了几个挑战，这些挑战帮助我们在旅程中改进了搜索准确性和数据质量。这些挑战对我们调整 RAG 相关性的方法起到了重要作用，我们很高兴与你分享我们的见解。

话虽如此：让我们深入了解细节！

我们的第一种方法

我们首先采用一种精益、有效的解决方案，该解决方案可以快速为我们带来一个有价值的 RAG 驱动的聊天机器人。这意味着专注于关键的功能方面，使其具备最佳的搜索功能，并随时准备投入运营。为了让我们了解背景，我们将快速介绍支持 AI 助手的四个关键组成部分：数据、查询、生成和反馈。

数据

正如本系列第二篇博客文章所展示的那样，我们的旅程始于一个庞大的数据库，其中包含 300,000 多份文档，包括技术支持知识文章和从我们的网站抓取的各种页面，例如 Elastic 的产品文档和博客。这个丰富的数据集是我们搜索查询的基础，确保可以精确检索有关 Elastic 产品的广泛信息。为此，我们利用 Elasticsearch 来存储和搜索我们的数据。

查询

有了丰富的数据可供搜索，现在是时候讨论我们的查询组件了。我们采用了标准的混合搜索策略，该策略结合了 BM25 的传统优势（基于关键字的搜索）和由 ELSER 提供支持的语义搜索功能。

对于语义搜索组件，我们针对 title 和 summary 嵌入使用了 text_expansion 查询。另一方面，对于广泛的关键字相关性，我们使用 cross_fields 搜索多个字段，并使用 minimum_should_match 参数进行调整，以便在较长的查询中表现更好。短语匹配通常表示相关性更高，因此会获得更高的提升。这是我们的初始设置：

const searchResults = await client.elasticsearchClient({
  // Alias pointing to the knowledge base indices.
  index: "knowledge-search", 
  body: {
    size: 3,
    query: {
      bool: {
        should: [
          // Keyword-search Component. 
          {
            multi_match: {
              query,
              // For queries with 3+ words, at least 49% must match.
              minimum_should_match: "1<-1 3<49%", 
              type: "cross_fields",
              fields: [
                "title",
                "summary",
                "body",
                "id",
              ],
            },
          },
          {
            multi_match: {
              query,
              type: "phrase",
              boost: 9,
              fields: [
                // Stem-based versions of our fields. 
                "title.stem",
                "summary.stem",
                "body.stem",
              ],
            },
          },
          // Semantic Search Component.
          {
            text_expansion: {
              "ml.inference.title_expanded.predicted_value": {
                model_id: ".elser_model_2",
                model_text: query,
              },
            },
          },
          {
            text_expansion: {
              "ml.inference.summary_expanded.predicted_value": {
                model_id: ".elser_model_2",
                model_text: query,
              },
            },
          },
        ],
      },
    },
  },
});

生成

搜索后，我们用不同的指令集构建系统提示，同时考虑将前 3 个搜索结果作为要使用的上下文。最后，我们将对话连同构建的上下文一起输入到 LLM 中，生成响应。以下是显示所述行为的伪代码：

// We then feed the context into the LLM, generating a response.
const { stopGeneration } = fetchChatCompletionAPI(
  {
    // The system prompt + vector search results.
    context: buildContext(searchResults), 
    // The entire conversation + the brand new user question.
    messages, 
    // Additional parameters.
    parameters: { model: LLM.GPT4 } 
  },
  {
    onGeneration: (event: StreamGenerationEvent) => {
      // Stream generation events back to the user interface here...
    }
  }
);

不包括超过 3 个搜索结果的原因是我们专用的 Azure OpenAI 的 GPT4 部署（PTU）中可用的 token 数量有限，并且用户群相对较大。

反馈

我们使用第三方工具来捕获客户端事件，连接到 Big Query 进行存储，并使 JSON 编码的事件可供团队中的每个人进行全面分析。以下是构建我们反馈视图的 Big Query 语法的概览。JSON_VALUE 函数是一种从事件负载中提取字段的方法：

  SELECT
    -- Extract relevant fields from event properties
    JSON_VALUE(event_properties, '$.chat_id') AS `Chat ID`,
    JSON_VALUE(event_properties, '$.input') AS `Input`,
    JSON_VALUE(event_properties, '$.output') AS `Output`,
    JSON_VALUE(event_properties, '$.context') AS `Context`,
    
    -- Determine the reaction (like or dislike) to the interaction
    CASE JSON_VALUE(event_properties, '$.reaction')
      WHEN 'disliked' THEN '👎'
      WHEN 'liked' THEN '👍'
    END AS `Reaction`,
    
    -- Extract feedback comment
    JSON_VALUE(event_properties, '$.comment') AS `Comment`,

    event_time AS `Time`
  FROM
    `frontend_events` -- Table containing event data
  WHERE
    event_type = "custom"
    AND JSON_VALUE(event_properties, '$.event_name') IN (
      'Chat Interaction', -- Input, output, context. 
      'Chat Feedback', -- Feedback comments.
      'Response Like/Dislike' -- Thumbs up/down.
    )
  ORDER BY `Chat ID` DESC, `Time` ASC; -- Order results by Chat ID and time

我们还利用了内部用户对聊天机器人体验的宝贵直接反馈，使我们能够快速识别搜索结果与用户意图不符的区域。结合两者将有助于发现过程，使我们能够改进 RAG 实现，正如我们将在下一节中观察到的那样。

挑战

随着使用，反馈中开始出现有趣的模式。一些用户查询（例如涉及特定 CVEs 或产品版本的查询）产生的结果并不理想，这表明用户意图与 GenAI 响应之间存在脱节。让我们仔细看看发现的具体挑战以及我们如何解决它们。

#1：CVEs（Common Vulnerabilities and Exposures - 常见漏洞和暴露）

我们的客户经常会收到有关可能影响其系统的开放 CVEs 列表的警报，这通常会导致支持案例。为了有效地解决这些问题，我们专门的内部团队精心维护 CVE 类型的知识文章。这些文章提供了 Elastic 的标准化官方描述，包括对影响的详细说明，并列出了每个 CVE 所影响的工件。

认识到我们的聊天机器人可以简化对这些关键信息的访问，我们的内部信息安全和支持工程团队开始探索其功能，并提出以下问题：

👨🏽 What are the implications of CVE's `2016-1837`, `2019-11756` and `2014-6439`?

对于此类问题，使用 RAG 的一个主要优势（也是采用此设计的主要功能目标）是，我们可以提取最新信息，将其作为 LLM 的背景，从而立即提供这些信息以生成出色的响应。与经过微调的 LLM 替代方案相比，这自然会节省我们的时间和资源。

但是，生成的响应不会像预期的那样表现。对于回答这些问题至关重要，搜索结果通常缺乏相关性，我们可以通过仔细查看示例的搜索结果来确认这一事实：

{
  ...
  "hits": [
    {
      "_index": "search-knowledge-articles",
      "_id": "...",
      "_score": 59.449028,
      "_source": {
        "id": "...",
        "title": "CVE-2019-11756", // Hit!
        "summary": "...",
        "body": "...",
        "category": "cve"
      }
    },
    {
      "_index": "search-knowledge-articles",
      "_id": "...",
      "_score": 42.15182,
      "_source": {
        "title": "CVE-2019-10172", // :(
        "summary": "...",
        "body": "...",
        "category": "cve"
      }
    },
    {
      "_index": "search-docs",
      "_id": "...",
      "_score": 38.413914,
      "_source": {
        "title": "Potential Sudo Privilege Escalation via CVE-2019-14287 | Elastic  Security Solution [8.11] | Elastic",  // :(
        "summary": "...",
        "body": "...",
        "category": "documentation"
      }
    }
  ]
}

由于只有一个相关命中（CVE-2019-10172），我们让 LLM 失去了生成正确答案所需的必要背景信息：

The context only contains information about CVE-2019-11756, which is...

观察到的行为促使我们提出一个有趣的问题：

我们如何利用用户经常在查询中包含接近精确的 CVE 代码这一事实来提高搜索结果的准确性？

为了解决这个问题，我们将这个问题视为一个搜索挑战。我们假设，通过强调直接包含 CVE 代码的文章的 title 字段匹配，我们可以显著提高搜索结果的准确性。这导致了一项战略决策，即有条件地提高搜索算法中标题匹配的权重。通过实施这一有针对性的调整，我们改进了查询策略，如下所示：

    ...
    should: [
        // Additional boosting for CVEs.
        {
          bool: {
            filter: {
              term: {
                category: 'cve',
              },
            },
            must: {
              match: {
                title: {
                  query: queryText,
                  boost: 10,
                },
              },
            },
          },
        },
        // BM-25 based search.
        {
          multi_match: {
             ...

因此，我们在 CVE 相关用例方面获得了更好的命中率，确保 CVE-2016-1837、CVE-2019-11756 和 CVE-2014-6439 位居前三名：

{
  ...
  "hits": [
    {
      "_index": "search-knowledge-articles",
      "_id": "...",
      "_score": 181.63962,
      "_source": {
        "title": "CVE-2019-11756",
        "summary": "...",
        "body": "...",
        "category": "cve"
      }
    },
    {
      "_index": "search-knowledge-articles",
      "_id": "...",
      "_score": 175.13728,
      "_source": {
        "title": "CVE-2014-6439",
        "summary": "...",
        "body": "...",
        "category": "cve"
      }
    },
    {
      "_index": "search-knowledge-articles",
      "_id": "...",
      "_score": 152.9553,
      "_source": {
        "title": "CVE-2016-1837",
        "summary": "...",
        "body": "...",
        "category": "cve"
      }
    }
  ]
}

从而得到 LLM 更好的回应：

🤖 The implications of the CVEs mentioned are as follows: (...)

太棒了！通过调整我们的混合搜索方法，我们显著提高了性能，这是一个非常简单但最有效的 Bob's Your Uncle 解决方案（就像有些人会说的那样）！这一改进强调了虽然语义搜索是一个强大的工具，但理解和利用用户意图对于优化搜索结果和业务现实中的整体聊天体验至关重要。考虑到这一点，让我们深入下一个挑战！

#2：产品版本

随着我们深入研究这些挑战，另一个重要问题出现了，即与特定版本相关的查询。用户经常询问功能、迁移指南或版本比较，但我们最初的搜索响应并未达到预期。例如，让我们来看看以下问题：

👨🏽 Can you compare Elasticsearch versions 8.14.3 and 8.14.2?

我们最初的查询方法将返回以下前 3 个：

Elasticsearch for Apache Hadoop version 8.14.1 | Elasticsearch for Apache Hadoop [8.14] | Elastic;
APM version 8.14 | Elastic Observability [8.14] | Elastic;
Elasticsearch for Apache Hadoop version 8.14.3 | Elasticsearch for Apache Hadoop [8.14] | Elastic.

对应于以下 _search 响应：

{
  ...
  "hits": [
    {
      "_index": "search-docs",
      "_id": "6807c4cf67ad0a52e02c4c2ef436194d2796faa454640ec64cc2bb999fe6633a",
      "_score": 29.79520,
      "_source": {
        "title": "Elasticsearch for Apache Hadoop version 8.14.1 | Elasticsearch for Apache Hadoop [8.14] | Elastic",
        "summary": "...",
        "body": "...",
        "url": "https://www.elastic.co/guide/en/elasticsearch/hadoop/current/eshadoop-8.14.1.html",
        "category": "documentation"
      }
    },
    {
      "_index": "search-docs",
      "_id": "9233c6b53a9d2e8b5daf1f68e7d82590516f0dfc23d979783ea1146854dc9d63",
      "_score": 27.19520,
      "_source": {
        "title": "APM version 8.14 | Elastic Observability [8.14] | Elastic",
        "summary": "...",
        "body": "...",
        "url": "https://www.elastic.co/guide/en/observability/current/apm-release-notes-8.14.html",
        "category": "documentation"
      }
    },
    {
      "_index": "search-docs",
      "_id": "392a39dd81933523d6018cac5ae4fc38b31de4ae3e1c0f1655d98191133c4e8a",
      "_score": 26.7618,
      "_source": {
        "title": "Elasticsearch for Apache Hadoop version 8.14.3 | Elasticsearch for Apache Hadoop [8.14] | Elastic",
        "summary": "...",
        "body": "...",
        "url": "https://www.elastic.co/guide/en/elasticsearch/hadoop/8.14/eshadoop-8.14.3.html",
        "category": "documentation"
      }
    }
  ]
}

由于这些答案无可挽回地不相关，最终导致聊天机器人给出了完全不相关的答案，影响了整体用户体验和对支持 AI 助手的信任：

🤖 As of my last update, Elasticsearch 8.14.2 and 8.14.3 had not been released, and there were no official release notes available for these versions. Elasticsearch follows a rapid release cycle, and new versions often include bug fixes, performance improvements, and occasionally new features. However, without specific release notes or announcements from Elastic, I cannot provide details on what is new in these hypothetical versions.

To find the most up-to-date information on Elasticsearch releases, you should check the official Elastic website or their GitHub repository for the latest release notes. Release notes are typically published alongside the software release and provide a comprehensive list of changes, including new features, enhancements, bug fixes, and known issues

进一步调查该问题后，我们收集了宝贵的见解。通过重播查询并查看搜索结果，我们注意到抓取的产品文档数据存在三个严重问题，导致整体性能不佳：

语义匹配不准确：从语义上讲，我们显然没有命中目标。为什么我们要匹配如此具体的文章，包括两篇专门关于 Apache Hadoop 的文章，而这个问题比 Hadoop 的范围要广得多？
多个版本，相同的文章：进一步查看最初提出的问题的匹配结果，我们经常注意到同一篇文章有多个版本，内容几乎完全相同。这通常会导致前 3 个版本充斥着不相关的匹配！
返回错误版本：我们有 8.14.1 和 8.14.2 版本的 Elasticsearch for Apache Hadoop 文章，可以预料到，我们会为查询返回后者 - 但这种情况并不总是发生。

从影响的角度来看，我们必须停下来解决这些问题 —— 否则，相当一部分用户查询将受到影响。让我们深入研究解决这两个问题的方法！

A. 语义匹配不准确

在对我们的数据进行一些检查后，我们发现语义匹配问题的根源在于，爬虫在提取时生成的产品文档类型文章的 summary 字段只是正文（body）的前几个字符。这种冗余误导了我们的语义模型，导致它生成的向量嵌入不能准确地表示与用户查询相关的文档内容。

作为一个数据问题，我们必须在数据领域解决这个问题：通过利用 GenAI 和 GPT4 模型，我们做出了一个团队决定，打造一种新的 AI 丰富服务 —— 在本博客系列的第二部分中介绍。我们决定创建自己的工具有几个具体的原因：

我们有未使用的 PTU 资源。为什么不使用它们？
我们需要迅速填补这个数据空白，因为这可能是最大的相关性缺陷。
我们想要一种完全可定制的方法来做我们自己的实验。

建模为通用的，我们的用法归结为为我们的数据生成四个新字段到新索引中，使用 Enrich Processors 使它们在摄取时可用于目标索引上的相应文档。以下是要生成的每个字段的规范的快速查看：

const fields: FieldToGenerate[] = [
  {
    // A one-liner summary for the article.
    name: 'ai_subtitle', 
    strategy: GenerationStrategy.AbstractiveSummarizer,
  },
  {
    // A longer summary for the article.
    name: 'ai_summary', 
    strategy: GenerationStrategy.AbstractiveSummarizer,
  },
  {
    // A list of questions answered by the article.
    name: 'ai_questions_answered', 
    strategy: GenerationStrategy.QuestionSummarizer,
  },
  {
    // A condensed list of tags for the article.
    name: 'ai_tags',
    strategy: GenerationStrategy.TagsSummarizer,
  }
];

生成这些字段并设置索引 Enrich Processors 后，底层 RAG 搜索索引将通过新的 ai_fields 对象进行丰富，同时还可以在 ai_fields.ml.inference 下使用 ELSER 嵌入：

{
  ...
  "_source": {
    "product_name": "Elasticsearch",
    "version": "8.14",
    "url": "https://www.elastic.co/guide/en/elasticsearch/hadoop/8.14/eshadoop-8.14.1.html",
    "ai_fields": {
      "ai_summary": "ES-Hadoop 8.14.1; tested against Elasticsearch 8.14.1. ES-Hadoop 8.14.1 is a compatibility release, aligning with Elasticsearch 8.14.1. This version ensures seamless integration and operation with Elasticsearch's corresponding version, maintaining feature parity and stability across the Elastic ecosystem.",
      "ai_subtitle": "ES-Hadoop 8.14.1 Compatibility Release",
      "ai_tags": [
        "Elasticsearch",
        "ES-Hadoop",
        "Compatibility",
        "Integration",
        "Version 8.14.1"
      ],
      "source_id": "6807c4cf67ad0a52e02c4c2ef436194d2796faa454640ec64cc2bb999fe6633a",
      "ai_questions_answered": [
        "What is ES-Hadoop 8.14.1?",
        "Which Elasticsearch version is ES-Hadoop 8.14.1 tested against?",
        "What is the purpose of the ES-Hadoop 8.14.1 release?"
      ],
      "ml": {
        "inference": {
          "ai_subtitle_expanded": {...},
          "ai_summary_expanded": {...},
          "ai_questions_answered_expanded": {...}
        }
      }
    }
  }
  ...
}

现在，我们可以调整查询以使用这些字段，从而实现更好的整体语义和关键字匹配：

   ...
   // BM-25 Component. 
   {
      multi_match: {
        ...
        type: 'cross_fields',
        fields: [
          ...
          // Adding the `ai_fields` to the `cross_fields` matcher.
          'ai_fields.ai_subtitle',
          'ai_fields.ai_summary',
          'ai_fields.ai_questions_answered',
          'ai_fields.ai_tags',
        ],
      },
   },
   {
      multi_match: {
        ...
        type: 'phrase',
        fields: [
          ...
          // Adding the `ai_fields` to the `phrase` matcher.
          'ai_fields.ai_subtitle.stem',
          'ai_fields.ai_summary.stem',
          'ai_fields.ai_questions_answered.stem',
        ],
      },
   },
   ...
   // Semantic Search Component.
   {
      text_expansion: {
        // Adding `text_expansion` queries for `ai_fields` embeddings.
        'ai_fields.ml.inference.ai_subtitle_expanded.predicted_value': {
          model_id: '.elser_model_2',
          model_text: queryText,
        },
      },
    },
    {
      text_expansion: {
        'ai_fields.ml.inference.ai_summary_expanded.predicted_value': {
          model_id: '.elser_model_2',
          model_text: queryText,
        },
      },
    },
    {
      text_expansion: {
        'ai_fields.ml.inference.ai_questions_answered_expanded.predicted_value':
          {
            model_id: '.elser_model_2',
            model_text: queryText,
          },
      },
    },
    ...

单凭这一点，我们就更有意义了。不仅如此，它还为我们在整个应用程序中使用人工智能生成的数据开辟了许多新的可能性 —— 我们将在未来的博客文章中讨论这些问题。

现在，在重试查询以检查结果之前：what about the multiple versions problem - 多版本问题怎么办？

B. 多个版本，相同的文章

当重复内容渗透到这些顶级位置时，它会降低数据池的价值，从而削弱 GenAI 响应的有效性并导致用户体验不佳。在这种情况下，我们遇到的一个重大挑战是同一篇文章存在多个版本。这种冗余虽然有助于丰富特定版本的数据集合，但通常会扰乱我们 LLM 的基本数据馈送，降低其多样性，从而损害响应质量。

为了解决这个问题，我们使用了 Elasticsearch API collapse 参数，筛选噪音并仅优先考虑单个内容的最相关版本。为此，我们在产品文档抓取的文档中计算了一个新的 slug 字段，以识别同一篇文章的不同版本，并将其用作 collapse 字段（或键）。

以 "Sort search results" 文档页面为例，我们有两个版本的这篇文章正在被抓取：

Sort search results | Elasticsearch Guide [8.14] | Elastic
Sort search results | Elasticsearch Guide [7.17] | Elastic

这两个将生成以下 slug：

guide-en-elasticsearch-reference-sort-search-results

利用这一点，我们现在可以调整查询以使用 collapse：

...
const searchQuery = {
  index: "knowledge-search",
  body: {
    ...
    query: {...},
    collapse: {
      // This is a "field alias" that will point to the `slug` field for product docs.
      field: "collapse_field" 
    }
  }
};
...

因此，我们现在只会在搜索结果中显示得分最高的文档，这肯定有助于增加发送到 LLM 的知识多样性。

C. 返回错误版本

与上面的 CVE 匹配问题类似，我们可以根据所提及的特定版本来提升结果，同时 version 在我们的索引中是一个单独的字段。为此，我们使用以下简单的基于正则表达式的函数直接从用户问题中提取版本：

/**
 * Extracts versions from the query text.
 * @param queryText The user query (or question).
 * @returns Array of versions found in the query text.
 * @example getVersionsFromQueryText("What's new in 8.14.2? And 8.14.3?") => ['8.14.2', '8.14.3'] 
 */
const getVersionsFromQueryText = (queryText: string) : string[] => {
  let results = null;
  const versions = [];
  // Will match `x.y.z` or `x.y` 
  const versionRegex = /\b\d+\.\d+(\.\d+)?\b/gi;

  while ((results = versionRegex.exec(queryText))) {
    const [version] = results;
    versions.push(version);
  }

  return versions;
};

然后，我们在 should 子句中添加一个查询，相应地提升 version 字段，并将正确的版本置于顶部（无论何时提及它们）：

    // Additional boosting for versions.
    ...(versions.length > 0 ? [
    {
      bool: {
        filter: {
          term: {
            category: 'documentation',
          },
        },
        must: {
          terms: {
            version: versions,
            boost: 4,
          },
        },
      },
   }] : []),
    // BM-25 Component.
   {
      multi_match: {
        ...
        type: 'cross_fields',
        fields: [
          ...
    ...

解决了 A、B 和 C 之后，我们可能就可以看到一些强有力的结果了！

让我们重新执行这个问题吧！

通过重新执行之前尝试的问题：

👨🏽 Can you compare Elasticsearch versions 8.14.3 and 8.14.2?

因此再次运行 Elasticsearch 查询，我们获得了明显更好的结果，包括以下文章：

Elasticsearch version 8.14.3 | Elasticsearch Guide [master] | Elastic
Elasticsearch version 8.14.2 | Elasticsearch Guide [master] | Elastic
Release notes | Elasticsearch Guide [8.14] | Elastic

因此，我们得到了由 LLM 生成的更好的答案。更强大的是 - 在本次对话的背景下，LLM 现在意识到 Elasticsearch 的版本比模型的截止日期更新，并围绕这些版本制定正确答案：

🤖 Certainly! Here's a formatted summary of the changes in Elasticsearch versions 8.14.2 and 8.14.3:

**Elasticsearch 8.14.2:**

- **Known Issue:**
  - Upgrading clusters from version 8.11.4 or earlier with non-master-eligible nodes may result in new functionality not being registered. This is fixed in version 8.15.0.

- **Bug Fixes:**
  - **Data streams:** Fixed lazy rollover request to rollover the target data stream once.
  - **ES|QL:** Addressed ESQL cancellation for exchange requests and issues with SingleValueQuery.LuceneQuery.
  - **Infra/Scripting:** Added array size check before returning array item in script doc values.
  - **Infra/Settings:** Ensured file settings readiness is guarded on file settings support.
  - **Machine Learning:** Resolved IndexOutOfBoundsException during inference.
  - **Mapping:** Re-defined index.mapper.dynamic setting for smoother 7.x to 8.x upgrades.
  - **Ranking:** Corrected the 'from' parameter when using sub_searches and rank.
  - **Search:** Supported hexstring for byte painless scorers and fixed tracking of collapse with docvalue_fields.

**Elasticsearch 8.14.3:**

- **Known Issue:**
  - Same as in 8.14.2 regarding the upgrade process and non-master-eligible nodes.

- **Bug Fixes:**
  - **Cluster Coordination:** Ensured tasks preserve versions in MasterService.
  - **ES|QL:** Implemented a compute listener.
  - **Mapping:** Set ignore_malformed only for the @timestamp automatically.
  - **TSDB:** Prohibited setting or updating the index.time_series.end_time in normal indices.

For both versions, users should be aware of the known issue related to upgrading clusters and follow the recommended resolution until the fix is available in version 8.15.0.

**URL References:**
- [Elasticsearch 8.14.2 Release Notes](https://www.elastic.co/guide/en/elasticsearch/reference/master/release-notes-8.14.2.html)
- [Elasticsearch 8.14.3 Release Notes](https://www.elastic.co/guide/en/elasticsearch/reference/master/release-notes-8.14.3.html)

很令人兴奋，对吧？但是我们现在如何量化查询的改进呢？让我们一起看看这些数字！

衡量成功

为了评估我们的变更所带来的性能，我们根据用户行为编制了一套测试套件，每个套件包含一个问题以及一份与回答该问题相关的精选结果列表。这些结果将涵盖广泛的主题和查询样式，反映出我们用户的不同需求。以下是完整的介绍：

...
const initialCases: Array<TestCase> = [
  {
    query: 'Can you compare Elasticsearch versions 8.14.3 and 8.14.2?',
    expectedResults: [...], // Elasticsearch version 8.14.3 | Elasticsearch Guide | Elastic, Elasticsearch version 8.14.2 | Elasticsearch Guide | Elastic.
  },
  {
    query: "What are the implications of CVE's 2019-10202, 2019-11756, 2019-15903?",
    expectedResults: [...], // CVE-2016-1837; CVE-2019-11756; CVE-2014-6439. 
  },
  {
    query: 'How to run the support diagnostics tool?',
    expectedResults: [...], // How to install and run the support diagnostics troubleshooting utility; How to install and run the ECK support diagnostics utility.
  },
  {
    query: 'How can I create data views in Kibana via API?',
    expectedResults: [...], // Create data view API | Kibana Guide | Elastic; How to create Kibana data view using api; Data views API | Kibana Guide | Elastic.
  },
  {
    query: 'What would the repercussions be of deleting a searchable snapshot and how would you be able to recover that index?',
    expectedResults: [...], // The repercussions of deleting a snapshot used by searchable snapshots; Does delete backing index delete the corresponding searchable snapshots, and vice versa?; Can one use a regular snapshot to restore searchable snapshot indices?; [ESS] Can deleted index data be recovered Elastic Cloud / Elasticsearch Service?.
  },
  {
    query: 'How can I create a data view in Kibana?',
    expectedResults: [...], // Create a data view | Kibana Guide | Elastic; Create data view API | Kibana Guide [8.2] | Elastic; How to create Kibana data view using api.
  },
  {
    query: 'Do we have an air gapped version of the Elastic Maps Service?',
    expectedResults: [...], // Installing in an air-gapped environment | Elastic Installation and Upgrade Guide [master] | Elastic; Connect to Elastic Maps Service | Kibana Guide | Elastic; 1.6.0 release highlights | Elastic Cloud on Kubernetes | Elastic.
  },
  {
    query: 'How to setup an enrich processor?',
    expectedResults: [...], // Set up an enrich processor | Elasticsearch Guide | Elastic; Enrich processor | Elasticsearch Guide | Elastic; Enrich your data | Elasticsearch Guide | Elastic.
  },
  {
    query: 'How to use index lifecycle management (ILM)?',
    expectedResults: [...], // Tutorial: Automate rollover with ILM | Elasticsearch Guide | Elastic; ILM: Manage the index lifecycle | Elasticsearch Guide | Elastic; ILM overview | Elasticsearch Guide | Elastic.
  },
  {
    query: 'How to rotate my ECE UI proxy certificates?',
    expectedResults: [...], // Manage security certificates | Elastic Cloud Enterprise Reference | Elastic; Generate ECE Self Signed Proxy Certificate; ECE Certificate Rotation (2.6 -> 2.10).
  },
  {
    query:
      'How to rotate my ECE UI proxy certificates between versions 2.6 and 2.10?',
    expectedResults: [...], // ECE Certificate Rotation (2.6 -> 2.10); Manage security certificates | Elastic Cloud Enterprise Reference | Elastic; Generate ECE Self Signed Proxy Certificate.
  }
];
...

但是，我们如何将这些测试用例转化为可量化的成功？为此，我们采用了 Elasticsearch 的 Ranking Evaluation API 以及 Precision at K (P@K) 指标来确定查询的前 K 次命中之间返回了多少个相关结果。由于我们对输入到 LLM 的前 3 个结果感兴趣，因此我们在这里将 K = 3。

为了根据我们精心挑选的问题列表自动计算此指标并有效评估我们的性能提升，我们使用 TypeScript/Node.js 创建了一个简单的脚本来包装所有内容。首先，我们定义一个函数来执行相应的 Ranking Evaluation API 调用：

const rankingEvaluation = async (
    // The query to execute ("before" or "after").
    getSearchRequestFn: (queryText: string) => string
) =>
    const testSuite = getTestSuite();
    const rankEvalResult = await elasticsearchClient.rankEval({
      index: 'knowledge-search',
      body: {
    metric: {
      precision: {
        k: 3,
        relevant_rating_threshold: 1,
      },
    },
    // For each test case, we'll have one item here.
    requests: testSuite.map((testCase) => ({
      id: testCase.queryText,
      request: getSearchRequestFn(testCase.queryText),
      ratings: testCase.expectedResults.map(({ _id, _index }) => ({
        _index,
        _id,
        rating: 1, // A value >= 1 means relevant.
      })),
    })),
      },
    });
    // Return a normalized version of the data.
    return transformRankEvalResult(rankEvalResult);
}

之后，我们需要定义优化前后的搜索查询：

// Before the optimizations.
const getSearchRequestBefore = (queryText: string): any => ({
  query: {
    bool: {
      should: [
        {
          multi_match: {
            query: queryText,
            minimum_should_match: '1<-1 3<49%',
            type: 'cross_fields',
            fields: ['title', 'summary', 'body', 'id'],
          },
        },
        {
          multi_match: {
            query: queryText,
            type: 'phrase',
            boost: 9,
            fields: [
              'title.stem',
              'summary.stem',
              'body.stem',
            ],
          },
        },
        {
          text_expansion: {
            'ml.inference.title_expanded.predicted_value': {
              model_id: '.elser_model_2',
              model_text: queryText,
            },
          },
        },
        {
          text_expansion: {
            'ml.inference.summary_expanded.predicted_value': {
              model_id: '.elser_model_2',
              model_text: queryText,
            },
          },
        },
      ],
    },
  },
});

// After the optimizations.
const getSearchRequestAfter = (queryText: string): any => {
  const versions = getVersionsFromQueryText(queryText);
  const matchesKeywords = [
    {
      multi_match: {
        query: queryText,
        minimum_should_match: '1<-1 3<49%',
        type: 'cross_fields',
        fields: [
          'title',
          'summary',
          'body',
          'id',
          'ai_fields.ai_subtitle',
          'ai_fields.ai_summary',
          'ai_fields.ai_questions_answered',
          'ai_fields.ai_tags',
        ],
      },
    },
    {
      multi_match: {
        query: queryText,
        type: 'phrase',
        boost: 9,
        slop: 0,
        fields: [
          'title.stem',
          'summary.stem',
          'body.stem',
          'ai_fields.ai_subtitle.stem',
          'ai_fields.ai_summary.stem',
          'ai_fields.ai_questions_answered.stem',
        ],
      },
    },
  ];

  const matchesSemantics = [
    {
      text_expansion: {
        'ml.inference.title_expanded.predicted_value': {
          model_id: '.elser_model_2',
          model_text: queryText,
        },
      },
    },
    {
      text_expansion: {
        'ml.inference.summary_expanded.predicted_value': {
          model_id: '.elser_model_2',
          model_text: queryText,
        },
      },
    },
    {
      text_expansion: {
        'ai_fields.ml.inference.ai_subtitle_expanded.predicted_value': {
          model_id: '.elser_model_2',
          model_text: queryText,
        },
      },
    },
    {
      text_expansion: {
        'ai_fields.ml.inference.ai_summary_expanded.predicted_value': {
          model_id: '.elser_model_2',
          model_text: queryText,
        },
      },
    },
    {
      text_expansion: {
        'ai_fields.ml.inference.ai_questions_answered_expanded.predicted_value':
          {
            model_id: '.elser_model_2',
            model_text: queryText,
          },
      },
    },
  ];

  const matchesCvesAndVersions = [
    {
      bool: {
        filter: {
          term: {
            category: 'cve',
          },
        },
        must: {
          match: {
            title: {
              query: queryText,
              boost: 10,
            },
          },
        },
      },
    },
    ...(versions.length > 0
      ? [
          {
            bool: {
              filter: {
                term: {
                  category: 'documentation',
                },
              },
              must: {
                terms: {
                  version: versions,
                  boost: 4,
                },
              },
            },
          },
        ]
      : []),
  ];

  return {
    query: {
      bool: {
        should: [
          ...matchesKeywords,
          ...matchesSemantics,
          ...matchesCvesAndVersions,
        ]
      },
    },
    collapse: {
      // Alias to the collapse key for each underlying index. 
      field: 'collapse_field' 
    },
  };
};

然后，我们将输出每个查询的结果指标：

const [rankEvaluationBefore, rankEvaluationAfter] =
  await Promise.all([
    rankingEvaluation(getSearchRequestBefore), // The "before" query.
    rankingEvaluation(getSearchRequestAfter), // The "after" query.
  ]);

console.log(`Before -> Precision at K = 3 (P@K):`);
console.table(rankEvaluationBefore);

console.log(`After -> Precision at K = 3(P@k):`);
console.table(rankEvaluationAfter);

// Computing the change in P@K.
const metricScoreBefore = rankEvaluationBefore.getMetricScore();
const metricScoreAfter = rankEvaluationAfter.getMetricScore();

const percentDifference =
  ((metricScoreAfter - metricScoreBefore) * 100) / metricScoreBefore;

console.log(`Change in P@K: ${percentDifference.toFixed(2)}%`);

最后，通过针对我们的开发 Elasticsearch 实例运行脚本，我们可以看到以下输出，展示了更改前后每个查询的 P@K 或 (P@3) 值。也就是说，前 3 个结果中有多少被认为与响应相关：

观察到的改进

当弓箭手仔细调整以进行精确射击时，我们最近在相关性方面的努力随着时间的推移带来了显着的精度改进。之前的每一项增强功能都是朝着提高 RAG 搜索结果的准确性和整体用户体验迈出的一小步。以下是我们的努力如何提高各种查询的性能：

前后对比 – P@K

前 3 名的相关结果：❌ = 0、🥉 = 1、🥈 = 2、🥇 = 3。

Query Description	P@K Before	P@K After	Change
Support Diagnostics Tool	0.333 🥉	1.000 🥇	+200%
Air Gapped Maps Service	0.333 🥉	0.667 🥈	+100%
CVE Implications	0.000 ❌	1.000 🥇	∞
Enrich Processor Setup	0.667 🥈	0.667 🥈	0%
Proxy Certificates Rotation	0.333 🥉	0.333 🥉	0%
Proxy Certificates Version-specific Rotation	0.333 🥉	0.333 🥉	0%
Searchable Snapshot Deletion	0.667 🥈	1.000 🥇	+50%
Index Lifecycle Management Usage	0.667 🥈	0.667 🥈	0%
Creating Data Views via API in Kibana	0.333 🥉	0.667 🥈	+100%
Kibana Data View Creation	1.000 🥇	1.000 🥇	0%
Comparing Elasticsearch Versions	0.000 ❌	0.667 🥈	∞
Maximum Bucket Size in Aggregations	0.000 ❌	0.333 🥉	∞

平均 P@K 改进：+78.41% 🏆🎉。让我们总结一下关于我们结果的一些观察：

显著改进：随着测量到的总体相关性增加 +78.41%，以下查询（Support Diagnostics Tool, CVE implications, Searchable Snapshot Deletion, Comparing Elasticsearch Versions ）显示出显著的增强。这些领域不仅登上了搜索相关性的领奖台，而且表现优异，远远超过了它们最初的表现！
优化机会：某些查询（Enrich Processor Setup, Kibana Data View Creation 及 Proxy Certificates Rotation ）表现出可靠的性能，没有倒退。这些结果强调了我们核心搜索策略的有效性。然而，这些提醒我们，搜索的精确度是一项持续不断的努力。这些静态结果突出了我们将在下一次迭代中集中精力提高目标的地方。随着我们的继续，我们还将扩展我们的测试套件，纳入更加多样化和精心挑选的用例，以确保我们的增强功能既相关又强大。

下一步是什么？🔎

前进的道路充满了进一步提升的机会，随着每次迭代，我们都致力于将 RAG 实施性能和整体体验推向更高水平。因此，让我们讨论一下我们目前感兴趣的领域！

我们的数据可以进一步优化搜索：尽管我们拥有大量的来源，但我们观察到，语义上接近的搜索候选往往会导致聊天机器人响应效率降低。一些抓取的页面并没有真正有价值，而且经常会产生对相关性产生负面影响的噪音。为了解决这个问题，我们可以通过应用大量技术来管理和增强我们现有的知识库，使其精简有效，以确保最佳的搜索体验。
聊天机器人必须处理对话 - RAG 搜索也必须如此：向聊天机器人提出后续问题是常见的用户行为。一个问题问 “How to configure Elasticsearch on a Linux machine - 如何在 Linux 机器上配置 Elasticsearch？”，然后是 “"What about Windows- Windows 呢？”应该查询类似 “How to configure Elasticsearch on a Linux machine- 如何在 Linux 机器上配置 Elasticsearch？”的内容（而不是原始的第二个问题）。 RAG 查询方法应该找到与整个对话上下文最相关的内容。
条件上下文包含：通过提取用户问题的语义含义，可以有条件地将数据片段作为上下文包含进来，从而节省 token 限制，使生成的内容更加相关，并可能节省搜索和外部服务的往返次数。