ES常用查询以及使用Java Api Client进行检索
1. 检索需求
参照豆瓣阅读的列表页面
需求:
- 检索词需要在数据库中的题名、作者和摘要字段进行检索并进行高亮标红
- 返回的检索结果需要根据综合、热度最高、最近更新、销量最高、好评最多进行排序
- 分页数量为10,并且返回检索到的总数量
2. 建立测试环境
2.1 根据需求建立es字段
mapping.json
{
"mappings": {
"properties": {
"title": {
"analyzer": "standard",
"type": "text"
},
"author": {
"analyzer": "standard",
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"contentDesc": {
"analyzer": "standard",
"type": "text"
},
"wordCount": {
"type": "double"
},
"price": {
"type": "double"
},
"cover": {
"type": "keyword"
},
"heatCount": {
"type": "integer"
},
"updateTime": {
"type": "date"
}
}
}
}
映射字段说明:
- id(长整型): 表示唯一标识的字段,类型为
long
- title(文本类型): 用于存储文档标题的字段,类型为
text
。指定默认的标准分析器(analyzer)为standard
- author(文本类型): 存储文档作者的字段,同样是
text
类型。除了使用标准分析器外,还定义额外的关键字(keyword)字段,该关键字字段通常用于==精确匹配和聚合==操作。 - contentDesc(文本类型): 存储文档内容描述的字段,同样是
text
类型,使用标准分析器。 - wordCount(双精度浮点型): 存储文档字数的字段,类型为
double
。通常用于存储浮点数值。 - price(双精度浮点型): 存储文档价格的字段,同样是
double
类型。用于存储浮点数值,例如书籍的价格。 - cover(关键字类型): 存储文档封面的字段,类型为
keyword
。关键字字段通常用于精确匹配。 - heatCount(整型): 存储热度计数的字段,类型为
integer
。通常用于热度排序 - updateTime(日期类型): 存储文档更新时间的字段,类型为
date
。用于最近更新排序
2.2 创建索引和映射
2.3 增加测试数据
POST /douban/_doc/1001
{
"title":"诗云",
"author":"刘慈欣",
"contentDesc":"伊依一行三人乘坐一艘游艇在南太平洋上做吟诗航行,平时难得一见的美洲大陆清晰地显示在天空中,在东半球构成的覆盖世界的巨大穹顶上,大陆好像是墙皮脱落的区域…",
"wordCount":18707,
"price":6.99,
"cover":"https://pic.arkread.com/cover/ebook/f/19534800.1653698501.jpg!cover_default.jpg",
"heatCount":201,
"updateTime":"2023-12-20"
}
POST /douban/_doc/1002
{
"title":"三体2·黑暗森林",
"author":"刘慈欣",
"contentDesc":"征服世界的中国科幻神作!包揽九项世界顶级科幻大奖!《三体》获得第73届“雨果奖”最佳长篇奖!",
"wordCount":318901,
"price":32.00,
"cover":"https://pic.arkread.com/cover/ebook/f/110344476.1653700299.jpg!cover_default.jpg",
"heatCount":545,
"updateTime":"2023-12-25"
}
POST /douban/_doc/1003
{
"title":"三体前传:球状闪电",
"author":"刘慈欣",
"contentDesc":"征服世界的中国科幻神作!包揽九项世界顶级科幻大奖!《三体》获得第73届“雨果奖”最佳长篇奖!",
"wordCount":181119,
"price":35.00,
"cover":"https://pic.arkread.com/cover/ebook/f/116984494.1653699856.jpg!cover_default.jpg",
"heatCount":765,
"updateTime":"2022-11-12"
}
POST /douban/_doc/1004
{
"title":"全频带阻塞干扰",
"author":"刘慈欣",
"contentDesc":"这是一个场面浩大而惨烈的故事。21世纪的某年,以美国为首的北约发起了对俄罗斯的全面攻击。在残酷的保卫战中,俄国的电子战设备无力抵挡美国的进攻",
"wordCount":28382,
"price":6.99,
"cover":"https://pic.arkread.com/cover/ebook/f/19532617.1653698557.jpg!cover_default.jpg",
"heatCount":153,
"updateTime":"2021-03-23"
}
3. 执行查询
3.1 主键查询
# 此种方式已过时,不推荐
GET /douban/_doc/1001
# 推荐此种方式
POST /douban/_search
{
"query": {
"match": {
"_id": 1001
}
}
}
3.2 全量查询
POST /douban/_search
{
"query": {
"match_all": {
}
}
}
3.3 分页查询
POST /douban/_search
{
"query": {
"match_all": {
}
},
"from":1,
"size":2
}
3.4 排序查询
POST /douban/_search
{
"query": {
"match_all": {
}
},
"sort": [
{
"price": {
"order": "desc"
}
}
]
}
3.5 全文检索
POST /douban/_search
{
"query": {
"match": {
"title":"三体球闪"
}
}
}
检索结果:
3.6 高亮检索
POST /douban/_search
{
"query": {
"match": {
"title": "三体球闪"
}
},
"highlight": {
"fields": {
"title": {
"pre_tags": [
"<font style='red'>"
],
"post_tags": [
"</font>"
]
}
}
}
}
3.7 bool查询
题名进行全文检索包含‘三体球闪’,并且价格为‘35’的数据
POST /douban/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "三体球闪"
}
},
{
"term": {
"price": 35
}
}
]
}
}
}
3.7 多字段全文检索
对题名、作者、摘要进行全文匹配,同时根据三个字段进行高亮标红
POST /douban/_search
{
"query": {
"multi_match": {
"query": "三体球闪",
"fields": [
"title",
"author",
"contentDesc"
]
}
},
"highlight": {
"fields": {
"title": {},
"author": {},
"contentDesc": {}
}
}
}
3.8 综合检索
对题名、作者、摘要进行全文匹配,同时根据三个字段进行高亮标红
增加分页条件查询、增加更新日期降序排序、同时返回需要的必备字段
POST /douban/_search
{
"query": {
"multi_match": {
"query": "三体球闪",
"fields": [
"title",
"author",
"contentDesc"
]
}
},
"from": 0,
"size": 2,
"_source": [
"title",
"author",
"price",
"wordCount"
],
"sort": [
{
"updateTime": {
"order": "desc"
}
}
],
"highlight": {
"fields": {
"title": {
},
"author": {
},
"contentDesc": {
}
}
}
}
4. Spring项目集成elasticsearch
参考文档:[Installation | Elasticsearch Java API Client 7.17] | Elastic
4.1 创建Spring项目并引入es依赖
如果希望使用java8,就打开pom.xml修改parent版本和java.version的值,然后点击刷新maven
在Elasticsearch7.15版本之后,Elasticsearch官方将它的高级客户端RestHighLevelClient标记为弃用状态。同时推出了全新的Java API客户端Elasticsearch Java API Client,该客户端也将在Elasticsearch8.0及以后版本中成为官方推荐使用的客户端。
Api名称 | 介绍 |
---|---|
基于TCP方式访问,只支持JAVA,7.x开始弃用,8.x删除. | |
Rest Lower Level Rest Client | 低等级RestApi,最小依赖。 |
高等级的RestApi,基于低等级Api,7.15开始弃用,但没有说明会删除。用低等级Api替换。 | |
RestClient | 基于Http的Api形式,跨语言,推荐使用,底层基于低等级Api,7.15才开始提供 |
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>7.17.11</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.12.3</version>
</dependency>
<!-- 此依赖的作用是解决:lassNotFoundException: jakarta.json.spi.JsonProvider
参考:https://github.com/elastic/elasticsearch-java/issues/311 -->
<dependency>
<groupId>jakarta.json</groupId>
<artifactId>jakarta.json-api</artifactId>
<version>2.0.1</version>
</dependency>
完整依赖如下:注意 properties中一定要加 <elasticsearch.version>7.17.11</elasticsearch.version>,否则会导致无法覆盖父引用中依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.5.15</version>
<relativePath/>
</parent>
<groupId>com.zhouquan</groupId>
<artifactId>client</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>client</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>8</java.version>
<lombok.version>1.18.22</lombok.version>
<elasticsearch.version>7.17.11</elasticsearch.version>
<jakarta.version>2.0.1</jakarta.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>7.17.11</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.12.3</version>
</dependency>
<!-- 此依赖的作用是解决:lassNotFoundException: jakarta.json.spi.JsonProvider
参考:https://github.com/elastic/elasticsearch-java/issues/311 -->
<dependency>
<groupId>org.glassfish</groupId>
<artifactId>jakarta.json</artifactId>
<version>${jakarta.version}</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>${lombok.version}</version>
</dependency>
<!-- Apache Commons IO -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.11.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
4.2 增加es客户端配置类
交给spring进行管理,使用时通过@Resource private ElasticsearchClient client; 注入即可使用
@Configuration
@Slf4j
public class EsClient {
@Resource
private EsConfig esConfig;
/**
* Bean 定义,用于创建 ElasticsearchClient 实例。
*
* @return 配置有 RestClient 和传输设置的 ElasticsearchClient 实例。
*/
@Bean
public ElasticsearchClient elasticsearchClient() {
// 使用 Elasticsearch 集群的主机和端口配置 RestClient
List<String> clusterNodes = esConfig.getClusterNodes();
HttpHost[] httpHosts = clusterNodes.stream().map(HttpHost::create).toArray(HttpHost[]::new);
// Create the low-level client
RestClient restClient = RestClient.builder(httpHosts).build();
// JSON 序列化
ElasticsearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
ElasticsearchClient client = new ElasticsearchClient(transport);
// 打印连接信息
log.info("Elasticsearch Client 连接节点信息:{}", Arrays.toString(httpHosts));
return client;
}
}
4.3 使用 Java API Client 创建索引
参考链接:Using the Java API Client
/**
* 创建索引
*/
@Test
void createIndex() throws IOException {
ClassLoader classLoader = ResourceLoader.class.getClassLoader();
InputStream input = classLoader.getResourceAsStream("mapping/douban.json");
CreateIndexRequest req = CreateIndexRequest.of(b -> b
.index("douban_v1")
.withJson(input)
);
boolean created = client.indices().create(req).acknowledged();
log.info("是否创建成功:" + created);
}
4.4 保存文档
实体类 DouBan.java
package com.zhouquan.client.entity;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.util.Date;
/**
* @author ZhouQuan
* @description todo
* @date 2024-01-09 15:54
**/
@Data
@AllArgsConstructor
@NoArgsConstructor
public class DouBan {
private String id;
private String title;
private String author;
private String contentDesc;
private Integer wordCount;
private Double price;
private String cover;
private Integer heatCount;
private Date updateTime;
}
4.4.1 索引单个文档
public String indexSingleDoc() {
IndexResponse indexResponse;
DouBan douBan = new DouBan("1211", "河边的错误", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());
try {
// 使用流式dsl保存
indexResponse = client.index(i -> i
.index(indexName)
.id(douBan.getId())
.document(douBan));
// 使用 Java API Client的静态of()方法
IndexRequest<DouBan> objectIndexRequest = IndexRequest.of(i -> i
.index(indexName)
.id(douBan.getId())
.document(douBan));
IndexResponse ofIndexResponse = client.index(objectIndexRequest);
// 使用经典版本
IndexRequest.Builder<DouBan> objectBuilder = new IndexRequest.Builder<>();
objectBuilder.index(indexName);
objectBuilder.id(douBan.getId());
objectBuilder.document(douBan);
IndexResponse classicIndexResponse = client.index(objectBuilder.build());
// 异步保存
asyncClient.index(i -> i
.index("douban")
.id(douBan.getId())
.document(douBan)
).whenComplete((response, exception) -> {
if (exception != null) {
log.error("Failed to index", exception);
} else {
log.info("Indexed with version " + response.version());
}
});
// 索引原始json数据
IndexResponse response = null;
try {
String jsonData = " {\"id\":\"1741\",\"title\":\"三体\",\"author\":\"刘慈欣\",\"contentDesc\":\"内容简介\",\"wordCount\":50000,\"price\":52.5}";
Reader input = new StringReader(jsonData);
IndexRequest<JsonData> request = IndexRequest.of(i -> i
.index("douban_v1")
.withJson(input)
);
response = client.index(request);
log.info("Indexed with version " + response.version());
} catch (IOException e) {
throw new RuntimeException(e);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return Result.Created.equals(indexResponse.result()) + "";
}
4.4.2 批量索引文档
/**
* 批量保存
*
* @throws IOException
*/
@Test
void bulkSave() throws IOException {
DouBan douBan1 = new DouBan("1002", "题名1", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());
DouBan douBan2 = new DouBan("1003", "题名2", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());
DouBan douBan3 = new DouBan("1004", "题名3", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());
List<DouBan> douBanList = new ArrayList<>();
douBanList.add(douBan1);
douBanList.add(douBan2);
douBanList.add(douBan3);
BulkRequest.Builder br = new BulkRequest.Builder();
for (DouBan douBan : douBanList) {
br.operations(op -> op
.index(idx -> idx
.index("products")
.id(douBan.getId())
.document(douBan)
)
);
}
BulkResponse result = client.bulk(br.build());
if (result.errors()) {
log.error("Bulk had errors");
for (BulkResponseItem item : result.items()) {
if (item.error() != null) {
log.error(item.error().reason());
}
}
}
}
4.4.3 原始数据批量索引文档
/**
* 原始json数据批量保存
*
* @throws IOException
*/
@Test
void rawDataBulkSave() throws IOException {
File logDir = new File("D:\\IdeaProjects\\client\\src\\main\\resources\\data");
File[] logFiles = logDir.listFiles(
file -> file.getName().matches("bulk*.*\\.json")
);
BulkRequest.Builder br = new BulkRequest.Builder();
for (File file : logFiles) {
FileInputStream input = new FileInputStream(file);
BinaryData data = BinaryData.of(IOUtils.toByteArray(input), ContentType.APPLICATION_JSON);
br.operations(op -> op
.index(idx -> idx
.index("douban_v1")
.document(data)
)
);
}
BulkResponse result = client.bulk(br.build());
if (result.errors()) {
List<BulkResponseItem> items = result.items();
items.forEach(x -> System.out.println(x.error()));
}
log.info("是否成功批量保存:" + !result.errors());
}
4.5 获取单个文档
// 根据id获取数据并装载为java对象
GetRequest getRequest = GetRequest.of(x -> x.index("douban_v1").id("1002"));
GetResponse<DouBan> douBanGetResponse = client.get(getRequest, DouBan.class);
DouBan source = douBanGetResponse.source();
GetResponse<DouBan> response = client.get(g -> g
.index(indexName)
.id(id),
DouBan.class
);
if (!response.found()) {
throw new BusinessException("未获取到指定id的数据");
}
DouBan douBan = response.source();
log.info("资料title: " + douBan.getTitle());
return douBan;
// 根据id获取原始JSON数据
GetResponse<ObjectNode> response1 = client.get(g -> g
.index(indexName)
.id(id),
ObjectNode.class
);
if (response1.found()) {
ObjectNode json = response1.source();
String name = json.get("title").asText();
log.info(" title " + name);
} else {
log.info("data not found");
}
return null;
4.6 文档检索
4.6.1 普通的搜索查询
public List<DouBan> search(String searchText) {
SearchResponse<DouBan> response = null;
try {
response = client.search(s -> s
.index(indexName)
.query(q -> q
.match(t -> t
.field("title")
.query(searchText)
)
),
DouBan.class
);
} catch (IOException e) {
throw new RuntimeException(e);
}
TotalHits total = response.hits().total();
boolean isExactResult = total.relation() == TotalHitsRelation.Eq;
if (isExactResult) {
log.info("There are " + total.value() + " results");
} else {
log.info("There are more than " + total.value() + " results");
}
List<Hit<DouBan>> hits = response.hits().hits();
List<DouBan> list = new ArrayList<>();
for (Hit<DouBan> hit : hits) {
DouBan DouBan = hit.source();
list.add(DouBan);
log.info("Found DouBan " + DouBan.getTitle() + ", score " + hit.score());
}
return list;
}
4.6.2 嵌套搜索查询
public List<DouBan> search2(String searchText, Double price) {
Query titleQuery = MatchQuery.of(m -> m
.field("title")
.query(searchText)
)._toQuery();
Query rangeQuery = RangeQuery.of(r -> r
.field("price")
.gte(JsonData.of(price))
)._toQuery();
try {
SearchResponse<DouBan> search = client.search(s -> s
.index(indexName)
.query(q -> q
.bool(b -> b
.must(titleQuery)
.must(rangeQuery)
)
)
,
DouBan.class
);
// 解析检索结果
List<DouBan> douBanList = new ArrayList<>();
List<Hit<DouBan>> hits = search.hits().hits();
for (Hit<DouBan> hit : hits) {
DouBan douBan = hit.source();
douBanList.add(douBan);
}
return douBanList;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
4.6.3 模板搜索
// 创建模板,返回搜索请求正文的存储脚本
client.putScript(r -> r
.id("query-script")
.script(s -> s
.lang("mustache")
.source("{\"query\":{\"match\":{\"{{field}}\":\"{{value}}\"}}}")
));
// 执行请求
SearchTemplateResponse<DouBan> response = client.searchTemplate(r -> r
.index("douban_v1")
.id("query-script")
.params("field", JsonData.of("title"))
.params("value", JsonData.of("题名")),
DouBan.class
);
// 结果解析
List<Hit<DouBan>> hits = response.hits().hits();
for (Hit<DouBan> hit: hits) {
DouBan DouBan = hit.source();
log.info("Found DouBan " + DouBan.getTitle() + ", score " + hit.score());
}
4.7 文档聚合
Query query = MatchQuery.of(t -> t
.field("title")
.query(searchText))._toQuery();
Aggregation authorAgg = AggregationBuilders.terms().field("author").build()._toAggregation();
SearchResponse<DouBan> response = null;
response = client.search(s -> s
.index(indexName)
.query(query)
.aggregations("author", authorAgg),
DouBan.class
);