Elasticsearch 开放 inference API 增加了对 OpenAI chat completions 的支持

news2026/2/15 14:29:18

作者：Tim Grein

我们很高兴地宣布在 Elasticsearch 中推出的最新创新：在 Elastic 的 inference API 中集成了 OpenAI Chat Completions 功能。这一新特性标志着我们在整合尖端人工智能能力至 Elasticsearch 的旅程中又迈出了一步，提供了生成类人文本完成等更多易于使用的功能。

更多关于 OpenAI Chat Completions 的用法，请阅读文章 “ChatGPT 和 Elasticsearch：OpenAI 遇见私有数据（二）”

Elastic 持续创新的本质

Elastic 在所有的人工智能领域都进行了大量投资。我们最近发布了许多新功能和令人振奋的集成：

Elasticsearch 的开发 inference API 增加了对 Cohere 嵌入的支持
引入 Elasticsearch 向量数据库到 Azure OpenAI 服务的数据上（预览版）
加速多图向量搜索
……探索更多 Elasticsearch labs 的内容，了解最近的发展情况。

我们的 inference API 中的新 completion 任务类型，作为第一个支持提供商，已经在我们的 Elastic Cloud 的 stateless 提供中可用。它将很快在我们的下一个版本中向所有人提供。

使用新的 completion API

在这个简短的指南中，我们将展示如何在文档摄入过程中使用 inference API 中的新 completion task 类型的简单示例。请参考 Elastic Search Labs 的 GitHub 仓库以获取更深入的指南和交互式笔记本。

要使以下指南工作，你需要拥有一个活跃的 OpenAI 账户并获取一个 API 密钥。请参考 OpenAI 的快速启动指南了解你需要遵循的步骤。你可以选择 OpenAI 的多种模型中的一种。在以下示例中，我们使用了 gpt-3.5-turbo。

在 Kibana 中，你将可以使用控制台输入以下步骤到 Elasticsearch，无需设置 IDE。

首先，你需要配置一个将执行 completion 任务的模型：

PUT _inference/completion/openai_chat_completions
{
    "service": "openai",
        "service_settings": {
        "api_key": <api-key>,
        "model_id": "gpt-3.5-turbo"
    }
}

运行此命令后，你应该会看到相应的 200 OK 状态，表明模型已正确设置，可以对任意文本进行推理。

现在，你可以调用配置的模型对任意文本输入进行推理：

POST _inference/completion/openai_chat_completions
{
    "input": "What is Elastic?"
}

你将收到一个类似于下面的带有状态码 200 OK 的响应：

{
    "completion": [
        {
            "result": "Elastic is a software company that provides a range of products and solutions for search, logging, security, and analytics. Its flagship product, Elasticsearch, is a distributed, RESTful search and analytics engine that is used for full-text search, structured search, and analytics. Elastic also offers other products such as Logstash for log collection and parsing, Kibana for data visualization and dashboarding, and Beats for lightweight data shippers. These products can be combined to create powerful data analysis and monitoring solutions for organizations of all sizes."
        }
    ]
}

下一个命令创建了一个示例文档，我们将使用刚刚配置的模型对其进行总结：

POST _bulk
{ "index" : { "_index" : "docs" } }
{"content": "You know, for search (and analysis) Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack. Logstash and Beats facilitate collecting, aggregating, and enriching your data and storing it in Elasticsearch. Kibana enables you to interactively explore, visualize, and share insights into your data and manage and monitor the stack. Elasticsearch is where the indexing, search, and analysis magic happens. Elasticsearch provides near real-time search and analytics for all types of data. Whether you have structured or unstructured text, numerical data, or geospatial data, Elasticsearch can efficiently store and index it in a way that supports fast searches. You can go far beyond simple data retrieval and aggregate information to discover trends and patterns in your data. And as your data and query volume grows, the distributed nature of Elasticsearch enables your deployment to grow seamlessly right along with it. While not every problem is a search problem, Elasticsearch offers speed and flexibility to handle data in a wide variety of use cases: Add a search box to an app or website Store and analyze logs, metrics, and security event data Use machine learning to automatically model the behavior of your data in real time Use Elasticsearch as a vector database to create, store, and search vector embeddings Automate business workflows using Elasticsearch as a storage engine Manage, integrate, and analyze spatial information using Elasticsearch as a geographic information system (GIS) Store and process genetic data using Elasticsearch as a bioinformatics research tool We’re continually amazed by the novel ways people use search. But whether your use case is similar to one of these, or you’re using Elasticsearch to tackle a new problem, the way you work with your data, documents, and indices in Elasticsearch is the same."}

为了总结多个文档，我们将使用一个 ingest pipeline，其中包含脚本处理器、推理处理器和删除处理器，来设置我们的摘要管道。

PUT _ingest/pipeline/summarization_pipeline
{
    "processors": [
        {
            "script": {
                "source": "ctx.prompt = 'Please summarize the following text: ' + ctx.content"
            }
        },
        {
            "inference": {
                "model_id": "openai_chat_completions",
                "input_output": {
                    "input_field": "prompt",
                    "output_field": "summary"
                }
            }
        },
        {
            "remove": {
                "field": "prompt"
            }
        }
  ]
}

该管道简单地在内容中加上了指令 “Please summarize the following text: ”，放在一个临时字段中，这样配置的模型就知道该如何处理文本了。当然，你可以根据需要更改这个文本，这就可以解锁各种其他流行的用例：

问答
翻译
...等等！

管道在执行推理后删除临时字段。

现在，我们通过调用 reindex API 将我们的文档(们)通过摘要管道发送出去。

POST _reindex
{
    "source": {
        "index": "docs",
        "size": 50
    },
    "dest": {
        "index": "docs_summaries",
        "pipeline": "summarization_pipeline"
    }
}

你的文档现已被总结，可以进行搜索了。

POST docs_summaries/_search
{
    "query": {
        "match_all": { }
    }
}

这就是全部内容了，你只需通过几个简单的 API 调用就创建了一个强大的摘要化流水线，可与任何摄取机制一起使用！摘要化非常实用，例如在生成语义嵌入或将大段文本转换为简洁摘要之前，对大段文本进行摘要化。这可以降低存储成本，提高价值交付速度，例如，如果你只对大型文档的摘要感兴趣等。顺便说一句，如果你想从二进制文档中提取文本，可以查看我们的开源数据提取服务！