SpringBoot+ElasticSearch实现文档内容抽取、高亮分词、全文检索

news2025/4/17 1:41:57

需求
产品希望我们这边能够实现用户上传PDF、WORD、TXT之内得文本内容,然后用户可以根据附件名称或文件内容模糊查询文件信息,并可以在线查看文件内容。

一、环境
项目开发环境:

后台管理系统springboot+mybatis_plus+mysql+es
搜索引擎:elasticsearch7.9.3 +kibana图形化界面

二、功能实现
1.搭建环境
es+kibana的搭建这里就不介绍了,网上多的是

后台程序搭建也不介绍,这里有一点很重要,Java使用的连接es的包的版本一定要和es的版本对应上,不然你会有各种问题

2.文件内容识别
第一步: 要用es实现文本附件内容的识别,需要先给es安装一个插件:Ingest Attachment Processor Plugin

这知识一个内容识别的插件,还有其它的例如OCR之类的其它插件,有兴趣的可以去搜一下了解一下

Ingest Attachment Processor Plugin是一个文本抽取插件,本质上是利用了Elasticsearch的ingest node功能,提供了关键的预处理器attachment。在安装目录下运行以下命令即可安装。

到es的安装文件bin目录下执行

elasticsearch-plugin install ingest-attachment

因为我们这里es是使用docker安装的,所以需要进入到es的docker镜像里面的bin目录下安装插件

[root@iZuf63d0pqnjrga4pi18udZ plugins]# docker exec -it es bash
[root@elasticsearch elasticsearch]# ls
LICENSE.txt  NOTICE.txt  README.asciidoc  bin  config  data  jdk  lib  logs  modules  plugins
[root@elasticsearch elasticsearch]# cd bin/
[root@elasticsearch bin]# ls
elasticsearch          elasticsearch-certutil  elasticsearch-croneval  elasticsearch-env-from-file  elasticsearch-migrate  elasticsearch-plugin         elasticsearch-setup-passwords  elasticsearch-sql-cli            elasticsearch-syskeygen  x-pack-env           x-pack-watcher-env
elasticsearch-certgen  elasticsearch-cli       elasticsearch-env       elasticsearch-keystore       elasticsearch-node     elasticsearch-saml-metadata  elasticsearch-shard            elasticsearch-sql-cli-7.9.3.jar  elasticsearch-users      x-pack-security-env
[root@elasticsearch bin]# elasticsearch-plugin install ingest-attachment
-> Installing ingest-attachment
-> Downloading ingest-attachment from elastic
[=================================================] 100%?? 
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessClassInPackage.sun.java2d.cmm.kcms
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.security.SecurityPermission createAccessControlContext
* java.security.SecurityPermission insertProvider
* java.security.SecurityPermission putProviderProperty.BC
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
 
Continue with installation? [y/N]y
-> Installed ingest-attachment

显示installed 就表示安装完成了,然后重启es,不然第二步要报错

第二步:创建一个文本抽取的管道

主要是用于将上传的附件转换成文本内容,支持(word,PDF,txt,excel没试,应该也支持)
在这里插入图片描述

{
    "description": "Extract attachment information",
    "processors": [
        {
            "attachment": {
                "field": "content",
                "ignore_missing": true
            }
        },
        {
            "remove": {
                "field": "content"
            }
        }
    ]
}

第三步:定义我们内容存储的索引
在这里插入图片描述

{
  "mappings": {
    "properties": {
      "id":{
        "type": "keyword"
      },
      "fileName":{
        "type": "text",
        "analyzer": "my_ana"
      },
      "contentType":{
        "type": "text",
         "analyzer": "my_ana"
      },
       "fileUrl":{
        "type": "text"
      },
      "attachment": {
        "properties": {
          "content":{
            "type": "text",
            "analyzer": "my_ana"
          }
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopword/stopwords.txt"
        },
        "jieba_synonym": {
          "type":        "synonym",
          "synonyms_path": "synonym/synonyms.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop",
            "jieba_synonym"
          ]
        }
      }
    }
  }
}

mapping:定义的是存储的字段格式
setting:索引的配置信息,这边定义了一个分词(使用的是jieba的分词)

注意:内容检索的是attachment.content字段,一定要使用分词,不使用分词的话,检索会检索不出来内容

第四步:测试

在这里插入图片描述

{
    "id":"1",
 "name":"进口红酒",
 "filetype":"pdf",
    "contenttype":"文章",
 "content":"文章内容"
}

测试内容需要将附件转换成base64格式

在线转换文件的地址:https://www.zhangxinxu.com/sp/base64.html

查询刚刚上传的文件:
在这里插入图片描述

{
    "took": 861,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "fileinfo",
                "_type": "_doc",
                "_id": "lkPEgYIBz3NlBKQzXYX9",
                "_score": 1.0,
                "_source": {
                    "fileName": "测试_20220809164145A002.docx",
                    "updateTime": 1660034506000,
                    "attachment": {
                        "date": "2022-08-09T01:38:00Z",
                        "content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
                        "author": "DELL",
                        "language": "lt",
                        "content": "内容",
                        "content_length": 2572
                    },
                    "createTime": 1660034506000,
                    "fileUrl": "http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/09/测试_20220809164145A002.docx",
                    "id": 1306333192,
                    "contentType": "文章",
                    "fileType": "docx"
                }
            },
            {
                "_index": "fileinfo",
                "_type": "_doc",
                "_id": "mUPHgYIBz3NlBKQzwIVW",
                "_score": 1.0,
                "_source": {
                    "fileName": "测试_20220809164527A001.docx",
                    "updateTime": 1660034728000,
                    "attachment": {
                        "date": "2022-08-09T01:38:00Z",
                        "content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
                        "author": "DELL",
                        "language": "lt",
                        "content": "内容",
                        "content_length": 2572
                    },
                    "createTime": 1660034728000,
                    "fileUrl": "http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/09/测试_20220809164527A001.docx",
                    "id": 1306333193,
                    "contentType": "文章",
                    "fileType": "docx"
                }
            },
            {
                "_index": "fileinfo",
                "_type": "_doc",
                "_id": "JDqshoIBbkTNu1UgkzFK",
                "_score": 1.0,
                "_source": {
                    "fileName": "txt测试_20220810153351A001.txt",
                    "updateTime": 1660116831000,
                    "attachment": {
                        "content_type": "text/plain; charset=UTF-8",
                        "language": "lt",
                        "content": "内容",
                        "content_length": 804
                    },
                    "createTime": 1660116831000,
                    "fileUrl": "http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/10/txt测试_20220810153351A001.txt",
                    "id": 1306333194,
                    "contentType": "告示",
                    "fileType": "txt"
                }
            }
        ]
    }
}

我们调用上传的接口,可以看到文本内容已经抽取到es里面了,后面就可以直接分词检索内容,高亮显示了

三.代码
介绍下代码实现逻辑:文件上传,数据库存储附件信息和附件上传地址;调用es实现文本内容抽取,将抽取的内容放到对应索引下;提供小程序全文检索的api实现根据文件名称关键词联想,文件名称内容全文检索模糊匹配,并高亮显示分词匹配字段;直接贴代码

yml配置文件:

# 数据源配置
spring:
    # 服务模块
    devtools:
        restart:
            # 热部署开关
            enabled: true
    # 搜索引擎
    elasticsearch:
        rest:
            url: 127.0.0.1
            uris: 127.0.0.1:9200
            connection-timeout: 1000
            read-timeout: 3000
            username: elastic
            password: 123456

elsticsearchConfig(连接配置)

package com.yj.rselasticsearch.domain.config;
 
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
 
import java.time.Duration;
 
@Configuration
public class ElasticsearchConfig {
    @Value("${spring.elasticsearch.rest.url}")
    private String edUrl;
    @Value("${spring.elasticsearch.rest.username}")
    private String userName;
    @Value("${spring.elasticsearch.rest.password}")
    private String password;
 
    @Bean
    public RestHighLevelClient restHighLevelClient() {
        //设置连接的用户名密码
        final BasicCredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(userName, password));
        RestHighLevelClient client =  new RestHighLevelClient(RestClient.builder(
                        new HttpHost(edUrl, 9200,"http"))
                .setHttpClientConfigCallback(httpClientBuilder -> {
                    httpClientBuilder.disableAuthCaching();
                    //保持连接池处于链接状态,该bug曾导致es一段时间没使用,第一次连接访问超时
                    httpClientBuilder.setKeepAliveStrategy(((response, context) -> Duration.ofMinutes(5).toMillis()));
                    return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                })
        );
        return client;
    }
}

文件上传保存文件信息并抽取内容到es

实体对象FileInfo

package com.yj.common.core.domain.entity;
 
import com.baomidou.mybatisplus.annotation.TableField;
import com.yj.common.core.domain.BaseEntity;
import lombok.Data;
import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.Setter;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
 
import java.util.Date;
 
@Setter
@Getter
@Document(indexName = "fileinfo",createIndex = false)
public class FileInfo {
    /**
    * 主键
    */
    @Field(name = "id", type = FieldType.Integer)
    private Integer id;
 
    /**
    * 文件名称
    */
    @Field(name = "fileName", type = FieldType.Text,analyzer = "jieba_index",searchAnalyzer = "jieba_index")
    private String fileName;
 
    /**
    * 文件类型
    */
    @Field(name = "fileType",  type = FieldType.Keyword)
    private String fileType;
 
    /**
    * 内容类型
    */
    @Field(name = "contentType", type = FieldType.Text)
    private String contentType;
 
    /**
     * 附件内容
     */
    @Field(name = "attachment.content", type = FieldType.Text,analyzer = "jieba_index",searchAnalyzer = "jieba_index")
    @TableField(exist = false)
    private String content;
 
    /**
    * 文件地址
    */
    @Field(name = "fileUrl", type = FieldType.Text)
    private String fileUrl;
 
    /**
     * 创建时间
     */
    private Date createTime;
 
    /**
     * 更新时间
     */
    private Date updateTime;
}

controller类

package com.yj.rselasticsearch.controller;
 
import com.yj.common.core.controller.BaseController;
import com.yj.common.core.domain.AjaxResult;
import com.yj.common.core.domain.entity.FileInfo;
import com.yj.rselasticsearch.service.FileInfoService;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
 
import javax.annotation.Resource;
 
/**
 * (file_info)表控制层
 *
 * @author xxxxx
 */
@RestController
@RequestMapping("/fileInfo")
public class FileInfoController extends BaseController {
    /**
     * 服务对象
     */
    @Resource
    private FileInfoService fileInfoService;
 
 
    @PutMapping("uploadFile")
    public AjaxResult uploadFile(String contentType, MultipartFile file) {
        return fileInfoService.uploadFileInfo(contentType,file);
    }
}

serviceImpl实现类

package com.yj.rselasticsearch.service.impl;
 
import com.alibaba.fastjson.JSON;
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.yj.common.config.RuoYiConfig;
import com.yj.common.core.domain.AjaxResult;
import com.yj.common.utils.FastUtils;
import com.yj.common.utils.StringUtils;
import com.yj.common.utils.file.FileUploadUtils;
import com.yj.common.utils.file.FileUtils;
import com.yj.framework.config.ServerConfig;
import lombok.extern.slf4j.Slf4j;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
import com.yj.common.core.domain.entity.FileInfo;
import com.yj.rselasticsearch.mapper.FileInfoMapper;
import com.yj.rselasticsearch.service.FileInfoService;
import org.springframework.web.multipart.MultipartFile;
 
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Base64;
 
@Service
@Slf4j
public class FileInfoServiceImpl implements FileInfoService{
    @Resource
    private ServerConfig serverConfig;
 
    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient client;
 
    @Resource
    private FileInfoMapper fileInfoMapper;
 
    /**
     * 上传文件并进行文件内容识别上传到es
     * @param contentType
     * @param file
     * @return
     */
    @Override
    public AjaxResult uploadFileInfo(String contentType, MultipartFile file) {
        if (FastUtils.checkNullOrEmpty(contentType,file)){
            return AjaxResult.error("请求参数不能为空");
        }
        try {
            // 上传文件路径
            String filePath = RuoYiConfig.getUploadPath() + "/fileInfo";
            FileInfo fileInfo = new FileInfo();
            // 上传并返回新文件名称
            String fileName = FileUploadUtils.upload(filePath, file);
            String prefix = fileName.substring(fileName.lastIndexOf(".")+1);
            File files = File.createTempFile(fileName, prefix);
            file.transferTo(files);
            String url = serverConfig.getUrl() + "/fileInfo" + fileName;
            fileInfo.setFileName(FileUtils.getName(fileName));
            fileInfo.setFileType(prefix);
            fileInfo.setFileUrl(url);
            fileInfo.setContentType(contentType);
            int result = fileInfoMapper.insertSelective(fileInfo);
            if (result > 0) {
                fileInfo = fileInfoMapper.selectOne(new LambdaQueryWrapper<FileInfo>().eq(FileInfo::getFileUrl,fileInfo.getFileUrl()));
                byte[] bytes = getContent(files);
                String base64 = Base64.getEncoder().encodeToString(bytes);
                fileInfo.setContent(base64);
                IndexRequest indexRequest = new IndexRequest("fileinfo");
                //上传同时,使用attachment pipline进行提取文件
                indexRequest.source(JSON.toJSONString(fileInfo), XContentType.JSON);
                indexRequest.setPipeline("attachment");
                IndexResponse indexResponse = client.index(indexRequest, RequestOptions.DEFAULT);
                log.info("indexResponse:" + indexResponse);
            }
            AjaxResult ajax = AjaxResult.success(fileInfo);
            return ajax;
        } catch (Exception e) {
            return AjaxResult.error(e.getMessage());
        }
    }
 
 
     /**
     * 文件转base64
     *
     * @param file
     * @return
     * @throws IOException
     */
    private byte[] getContent(File file) throws IOException {
 
        long fileSize = file.length();
        if (fileSize > Integer.MAX_VALUE) {
            log.info("file too big...");
            return null;
        }
        FileInputStream fi = new FileInputStream(file);
        byte[] buffer = new byte[(int) fileSize];
        int offset = 0;
        int numRead = 0;
        while (offset < buffer.length
                && (numRead = fi.read(buffer, offset, buffer.length - offset)) >= 0) {
            offset += numRead;
        }
        // 确保所有数据均被读取
        if (offset != buffer.length) {
            throw new ServiceException("Could not completely read file "
                    + file.getName());
        }
        fi.close();
        return buffer;
    }
}

高亮分词检索

参数请求WarningInfoDto

package com.yj.rselasticsearch.domain.dto;
 
import com.yj.common.core.domain.entity.WarningInfo;
import io.swagger.annotations.ApiModel;
import io.swagger.annotations.ApiModelProperty;
import lombok.Data;
 
import java.util.List;
 
/**
 * 前端请求数据传输
 * WarningInfo
 * @author luoY
 */
@Data
@ApiModel(value ="WarningInfoDto",description = "告警信息")
public class WarningInfoDto{
    /**
     * 页数
     */
    @ApiModelProperty("页数")
    private Integer pageIndex;
 
    /**
     * 每页数量
     */
    @ApiModelProperty("每页数量")
    private Integer pageSize;
 
    /**
     * 查询关键词
     */
    @ApiModelProperty("查询关键词")
    private String keyword;
 
    /**
     * 内容类型
     */
    private List<String> contentType;
 
    /**
     * 用户手机号
     */
    private String phone;
}

controller类

package com.yj.rselasticsearch.controller;
 
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.yj.common.core.controller.BaseController;
import com.yj.common.core.domain.AjaxResult;
import com.yj.common.core.domain.entity.FileInfo;
import com.yj.common.core.domain.entity.WarningInfo;
import com.yj.rselasticsearch.service.ElasticsearchService;
import com.yj.rselasticsearch.service.WarningInfoService;
import io.swagger.annotations.Api;
import io.swagger.annotations.ApiImplicitParam;
import io.swagger.annotations.ApiImplicitParams;
import io.swagger.annotations.ApiOperation;
import org.springframework.web.bind.annotation.*;
import com.yj.rselasticsearch.domain.dto.WarningInfoDto;
 
import javax.annotation.Resource;
import javax.servlet.http.HttpServletRequest;
import java.util.List;
 
/**
 * es搜索引擎
 *
 * @author luoy
 */
@Api("搜索引擎")
@RestController
@RequestMapping("es")
public class ElasticsearchController extends BaseController {
    @Resource
    private ElasticsearchService elasticsearchService;
 
    /**
     * 告警信息关键词联想
     *
     * @param warningInfoDto
     * @return
     */
    @ApiOperation("关键词联想")
    @ApiImplicitParams({
            @ApiImplicitParam(name = "contenttype", value = "文档类型", required = true, dataType = "String", dataTypeClass = String.class),
            @ApiImplicitParam(name = "keyword", value = "关键词", required = true, dataType = "String", dataTypeClass = String.class)
    })
    @PostMapping("getAssociationalWordDoc")
    public AjaxResult getAssociationalWordDoc(@RequestBody WarningInfoDto warningInfoDto, HttpServletRequest request) {
        List<String> words = elasticsearchService.getAssociationalWordOther(warningInfoDto,request);
        return AjaxResult.success(words);
    }
 
 
    /**
     * 告警信息高亮分词分页查询
     *
     * @param warningInfoDto
     * @return
     */
    @ApiOperation("高亮分词分页查询")
    @ApiImplicitParams({
            @ApiImplicitParam(name = "keyword", value = "关键词", required = true, dataType = "String", dataTypeClass = String.class),
            @ApiImplicitParam(name = "pageIndex", value = "页码", required = true, dataType = "Integer", dataTypeClass = Integer.class),
            @ApiImplicitParam(name = "pageSize", value = "页数", required = true, dataType = "Integer", dataTypeClass = Integer.class),
            @ApiImplicitParam(name = "contenttype", value = "文档类型", required = true, dataType = "String", dataTypeClass = String.class)
    })
    @PostMapping("queryHighLightWordDoc")
    public AjaxResult queryHighLightWordDoc(@RequestBody WarningInfoDto warningInfoDto,HttpServletRequest request) {
        IPage<FileInfo> warningInfoListPage = elasticsearchService.queryHighLightWordOther(warningInfoDto,request);
        return AjaxResult.success(warningInfoListPage);
    }
}

serviceImpl实现类

package com.yj.rselasticsearch.service.impl;
 
import com.alibaba.fastjson.JSON;
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.baomidou.mybatisplus.core.metadata.IPage;
import com.baomidou.mybatisplus.extension.plugins.pagination.Page;
import com.yj.common.constant.DataConstants;
import com.yj.common.constant.HttpStatus;
import com.yj.common.core.domain.entity.FileInfo;
import com.yj.common.core.domain.entity.WarningInfo;
import com.yj.common.core.domain.entity.WhiteList;
import com.yj.common.core.redis.RedisCache;
import com.yj.common.exception.ServiceException;
import com.yj.common.utils.FastUtils;
import com.yj.rselasticsearch.domain.dto.RetrievalRecordDto;
import com.yj.rselasticsearch.domain.dto.WarningInfoDto;
import com.yj.rselasticsearch.domain.vo.MemberVo;
import com.yj.rselasticsearch.service.*;
import lombok.extern.slf4j.Slf4j;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.Operator;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.core.query.*;
import org.springframework.stereotype.Service;
 
import javax.annotation.Resource;
import javax.servlet.http.HttpServletRequest;
import java.util.*;
import java.util.stream.Collectors;
 
@Service
@Slf4j
public class ElasticsearchServiceImpl implements ElasticsearchService {
 
    @Resource
    private WhiteListService whiteListService;
 
    @Autowired
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient client;
 
    @Autowired
    private RedisCache redisCache;
 
    @Resource
    private TokenService tokenService;
 
 
    /**
     * 文档信息关键词联想(根据输入框的词语联想文件名称)
     *
     * @param warningInfoDto
     * @return
     */
    @Override
    public List<String> getAssociationalWordOther(WarningInfoDto warningInfoDto, HttpServletRequest request) {
        //需要查询的字段
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
                .should(QueryBuilders.matchBoolPrefixQuery("fileName", warningInfoDto.getKeyword()));
        //contentType标签内容过滤
        boolQueryBuilder.must(QueryBuilders.termsQuery("contentType", warningInfoDto.getContentType()));
        //构建高亮查询
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(boolQueryBuilder)
                .withHighlightFields(
                        new HighlightBuilder.Field("fileName")
                )
                .withHighlightBuilder(new HighlightBuilder().preTags("<span style='color:red'>").postTags("</span>"))
                .build();
        //查询
        SearchHits<FileInfo> search = null;
        try {
            search = elasticsearchRestTemplate.search(searchQuery, FileInfo.class);
        } catch (Exception ex) {
            ex.printStackTrace();
            throw new ServiceException(String.format("操作错误,请联系管理员!%s", ex.getMessage()));
        }
        //设置一个最后需要返回的实体类集合
        List<String> resultList = new LinkedList<>();
        //遍历返回的内容进行处理
        for (org.springframework.data.elasticsearch.core.SearchHit<FileInfo> searchHit : search.getSearchHits()) {
            //高亮的内容
            Map<String, List<String>> highlightFields = searchHit.getHighlightFields();
            //将高亮的内容填充到content中
            searchHit.getContent().setFileName(highlightFields.get("fileName") == null ? searchHit.getContent().getFileName() : highlightFields.get("fileName").get(0));
            if (highlightFields.get("fileName") != null) {
                resultList.add(searchHit.getContent().getFileName());
            }
        }
        //list去重
        List<String> newResult = null;
        if (!FastUtils.checkNullOrEmpty(resultList)) {
            if (resultList.size() > 9) {
                newResult = resultList.stream().distinct().collect(Collectors.toList()).subList(0, 9);
            } else {
                newResult = resultList.stream().distinct().collect(Collectors.toList());
            }
        }
        return newResult;
    }
 
    /**
     * 高亮分词搜索其它类型文档
     *
     * @param warningInfoDto
     * @param request
     * @return
     */
    @Override
    public IPage<FileInfo> queryHighLightWordOther(WarningInfoDto warningInfoDto, HttpServletRequest request) {
        //分页
        Pageable pageable = PageRequest.of(warningInfoDto.getPageIndex() - 1, warningInfoDto.getPageSize());
         //需要查询的字段,根据输入的内容分词全文检索fileName和content字段
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
                .should(QueryBuilders.matchBoolPrefixQuery("fileName", warningInfoDto.getKeyword()))
                .should(QueryBuilders.matchBoolPrefixQuery("attachment.content", warningInfoDto.getKeyword()));
        //contentType标签内容过滤
        boolQueryBuilder.must(QueryBuilders.termsQuery("contentType", warningInfoDto.getContentType()));
        //构建高亮查询
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(boolQueryBuilder)
                .withHighlightFields(
                        new HighlightBuilder.Field("fileName"), new HighlightBuilder.Field("attachment.content")
                )
                .withHighlightBuilder(new HighlightBuilder().preTags("<span style='color:red'>").postTags("</span>"))
                .build();
        //查询
        SearchHits<FileInfo> search = null;
        try {
            search = elasticsearchRestTemplate.search(searchQuery, FileInfo.class);
        } catch (Exception ex) {
            ex.printStackTrace();
            throw new ServiceException(String.format("操作错误,请联系管理员!%s", ex.getMessage()));
        }
        //设置一个最后需要返回的实体类集合
        List<FileInfo> resultList = new LinkedList<>();
        //遍历返回的内容进行处理
        for (org.springframework.data.elasticsearch.core.SearchHit<FileInfo> searchHit : search.getSearchHits()) {
            //高亮的内容
            Map<String, List<String>> highlightFields = searchHit.getHighlightFields();
            //将高亮的内容填充到content中
            searchHit.getContent().setFileName(highlightFields.get("fileName") == null ? searchHit.getContent().getFileName() : highlightFields.get("fileName").get(0));
            searchHit.getContent().setContent(highlightFields.get("content") == null ? searchHit.getContent().getContent() : highlightFields.get("content").get(0));
            resultList.add(searchHit.getContent());
        }
        //手动分页返回信息
        IPage<FileInfo> warningInfoIPage = new Page<>();
        warningInfoIPage.setTotal(search.getTotalHits());
        warningInfoIPage.setRecords(resultList);
        warningInfoIPage.setCurrent(warningInfoDto.getPageIndex());
        warningInfoIPage.setSize(warningInfoDto.getPageSize());
        warningInfoIPage.setPages(warningInfoIPage.getTotal() % warningInfoDto.getPageSize());
        return warningInfoIPage;
    }
}

代码测试:
在这里插入图片描述

--请求jason
{
    "keyword":"全库备份",
    "contentType":["告示"],
    "pageIndex":1,
    "pageSize":10
}
 
 
--响应
{
    "msg": "操作成功",
    "code": 200,
    "data": {
        "records": [
            {
                "id": 1306333194,
                "fileName": "txt测试_20220810153351A001.txt",
                "fileType": "txt",
                "contentType": "告示",
                "content": "•\t秒级快速<span style='color:red'>备份</span>\r\n不论多大的数据量,<span style='color:red'>全库</span><span style='color:red'>备份</span>只需30秒,而且<span style='color:red'>备份过程</span>不会对数据库加锁,对应用程序几乎无影响,全天24小时均可进行<span style='color:red'>备份</span>。",
                "fileUrl": "http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/10/txt测试_20220810153351A001.txt",
                "createTime": "2022-08-10T15:33:51.000+08:00",
                "updateTime": "2022-08-10T15:33:51.000+08:00"
            }
        ],
        "total": 1,
        "size": 10,
        "current": 1,
        "orders": [],
        "optimizeCountSql": true,
        "searchCount": true,
        "countId": null,
        "maxLimit": null,
        "pages": 1
    }
}

返回的内容将分词检索到匹配的内容,并将匹配的词高亮显示。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1547039.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

2024.3.26学习笔记

今日学习韩顺平java0200_韩顺平Java_对象机制练习_哔哩哔哩_bilibili 今日学习p273-p285 包 包的本质实际上就是创建不同的文件夹/目录来保存类文件 包的三大作用 区分相同名字的类 当类很多时&#xff0c;可以很好的管理类 控制访问范围 包的基本语法 package com.xx…

函数进阶-Python

师从黑马程序员 函数中多个返回值的接收 def test_return():return 1,"hello",3x,y,ztest_return() print(x) print(y) print(z) 多种参数的使用 函数参数种类 位置参数 关键字参数 def user_info(name,age,gender):print(f"姓名是{name},年龄是:{age},性别是…

nandgame中的控制单元(Control Unit)

关卡说明的翻译&#xff1a; 控制单元除了ALU指令之外&#xff0c;计算机还应支持数据指令。在数据指令中&#xff0c;指令值直接写入A寄存器。创建一个控制单元&#xff0c;根据指令I的高位执行数据指令或ALU指令&#xff1a;位 15 0 数据指令 1 ALU指令ALU指令 对于ALU指令&…

Linux虚拟机的安装部署--尚硅谷笔记

part1 VMware的使用 学习目标 1 熟悉VMware软件的使用 2 可以熟练为虚拟计算机安装Linux操作系统 3 能独立解决安装过程中的常见问题 第一节 VMware的作用 VMware软件的作用 ![外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传] 第一步&#xff0c;在W…

【最新!红外小目标检测算法HCFNet】

文章目录 摘要1 引言2 相关工作2.1 传统方法2.2 深度学习方法 3 方法3.1 PPA3.2 维度感知选择性整合模块3.3 多稀释通道细化器模块3.4 损失函数设计 4 实验4.1 数据集与评估指标4.2 实现细节4.3 消融和对比 5 结论 论文&#xff1a;HCF-Net: Hierarchical Context Fusion Netwo…

监控API的指标

监控服务器已经是常态了&#xff0c;但是监控API的表现是啥意思呢&#xff1f;还有监控指标&#xff1f;今天就来看看如何监控API。 正如监控应用程序以确保高质量性能一样&#xff0c;也必须监控API。 API是应用程序相互通信的管道。更具体地说&#xff0c;API提供了一种方法…

Day42:WEB攻防-PHP应用MYSQL架构SQL注入跨库查询文件读写权限操作

目录 PHP-MYSQL-Web组成架构 PHP-MYSQL-SQL常规查询 手工注入 PHP-MYSQL-SQL跨库查询 跨库注入 PHP-MYSQL-SQL文件读写 知识点&#xff1a; 1、PHP-MYSQL-SQL注入-常规查询 2、PHP-MYSQL-SQL注入-跨库查询 3、PHP-MYSQL-SQL注入-文件读写 MYSQL注入&#xff1a;&#xff…

ROM-IP

1.原理 通过添加数据文件&#xff0c;使ROM看起来不是易失性存储器&#xff0c; 产生256个数据&#xff0c;每个数据的位宽是8 如果前面为x&#xff0c;后面就是x256-1 2.单端口ROM配置 FPGA内部没有非易失性存储器。调用的ROM和RAM都是用RAM来生成的 3.双端口ROM配置 使用第一…

《操作系统导论》第10章读书笔记:多处理器调度(高级)

《操作系统导论》第10章读书笔记&#xff1a;多处理器调度(高级) —— 杭州 2024-03-26 夜 文章目录 《操作系统导论》第10章读书笔记&#xff1a;多处理器调度(高级)1.背景&#xff1a;多处理器架构2.别忘了同步3.最后一个问题&#xff1a;缓存亲和度4.单队列调度和多队列调度…

GDAl 之绘制栅格图像的大致直方图和精准直方图(8)

gdal的绘制大致直方图是仅查看概览或者抽样像素的一个子集 import os from osgeo import gdal import matplotlib.pyplot as plt import numpy as np# Dont forget to change directory. os.chdir(rD:\DeskTop\learn_py_must\Learn_GDAL\osgeopy-data\osgeopy-data\Switzerlan…

Obsidian+PicGo+Gitee搭建免费图床

之前使用PicGoGitee配合Typora&#xff0c;后来因为换电脑Typora管理笔记不方便&#xff0c;换到Obsidian笔记&#xff0c;此处记录重新搭建图床的坑与经验。 主要参考# picGogitee搭建Obsidian图床&#xff0c;实现高效写作&#xff01; 1 下载安装PicGo 下载链接https://mo…

Nginx(Docker 安装的nginx)配置域名SSL证书

1.首先确保Linux环境上已经安装了docker&#xff08;可参考VMware使用和Linux安装Docker_wmware直接部署linux和安装docker后-CSDN博客 2.通过docker 安装nginx&#xff08;可参考Linux 环境安装Nginx—源码和Dokcer-CSDN博客&#xff09; 3.安装SSL证书 3.1 在宿主机中创建…

Java零基础入门到精通_Day 2

18 算数运算符 - * / % 整数的运算只能得到整数 除非用浮点数进行运算&#xff08;得到浮点数&#xff09; public class Base_002 {public static void main(String[] args) {double a 6.0;int b 4;System.out.println(a/b); //1.5} } 19 字符的操作 public class Base_0…

鸿蒙OS封装【axios 网络请求】(类似Android的Okhttp3)

Okhttp.ets /*** 网络请求*/ import axios from ohos/axios import httpConstants from ../net/HttpConstants import errorCode from ../utils/errorCode import toast from ../utils/ToastUtils import router from ../utils/RouterUtils import SPUtils from ../utils/SPUt…

Transformer的前世今生 day08(Positional Encoding)

前情提要 Attention的优点&#xff1a;解决了长序列依赖问题&#xff0c;可以并行。Attention的缺点&#xff1a;开销变大了&#xff0c;而且不存在位置关系为了解决Attention中不存在位置关系的缺点&#xff0c;我们通过位置编码的形式加上位置关系 Positional Encoding&…

【保姆级讲解Edge兼容性问题解决方法】

&#x1f308;个人主页:程序员不想敲代码啊&#x1f308; &#x1f3c6;CSDN优质创作者&#xff0c;CSDN实力新星&#xff0c;CSDN博客专家&#x1f3c6; &#x1f44d;点赞⭐评论⭐收藏 &#x1f91d; 希望本文对您有所裨益&#xff0c;如有不足之处&#xff0c;欢迎在评论区提…

头条网盘如何快速获取授权推广

近期可以说是网盘拉新的一个盛宴&#xff0c;好几家网盘为了抢夺用户&#xff0c;都在付费拉新用户&#xff0c;而如今头条网盘也需要开拓市场&#xff0c;方式也很简单粗暴&#xff0c;就是拿钱砸&#xff0c;而对于普通用户来说&#xff0c;只要获得授权&#xff0c;正是赚钱…

【Linux】基础 IO(动静态库)-- 详解

一、前言 为什么要使用别人的代码&#xff1f; 主要是为了提高程序开发的效率和程序的健壮性。 当别人把功能都实现了&#xff0c;然后我们再基于别人的代码去做二次开发&#xff0c;那么效率当然就提高了。其次&#xff0c;这里基于的别人当然不是随便找的一个人&#xff0c;…

基于SpringBoot和Leaflet的行政区划地图掩膜效果实战

目录 前言 一、掩膜小知识 1、GIS掩膜的实现原理 2、图层掩膜流程 二、使用插件 1、leaflet-mask介绍 2、核心代码解释 三、完整实例实现 1、后台逻辑实现 2、省级行政区划查询实现 3、行政区划定位及掩膜实现 4、成果展示 总结 前言 在之前的博客提过按空间矢量…

喜报!数维杯数学建模成功入围安徽工业大学学科竞赛推荐名单

喜报&#xff01;数维杯数学建模挑战赛成功入围安徽工业大学学科竞赛参赛目录。 安徽工业大学创新创业学院发布了《2023年度安徽工业大学大学生学科竞赛参赛目录》。 其中&#xff0c;数维杯大学生数学建模挑战赛成功入围竞赛参赛目录&#xff0c;感谢全国各高校对数维杯的高…