由于es高亮显示机制的问题。当全文内容过多,且搜索中标又少时,就会出现高亮结果无法覆盖全文。因此需要根据需求手动替换。
1.根据es的ik分词器获取搜索词的分词结果。
es部分:
//中文分词解析
post /_analyze
{
"analyzer":"ik_smart",
"text":"谷歌浏览器"
}
//结果
{
"tokens": [
{
"token": "谷歌",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "浏览器",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 1
}
]
}
注意:ik_smart 是最粗颗粒度,不会有重复分词。ik_max_word 是最细颗粒度,会有重复分词。高亮显示只需要最粗即可。
ik_smart:
ik_max_word:
将es的语句转为Java语句:
//主要使用的包
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.elasticsearch.client.Request;
import org.elasticsearch.client.Response;
import org.elasticsearch.client.RestHighLevelClient;
@Resource
private RestHighLevelClient restHighLevelClient;
/**
* 获取到es的分词结果
*
* @param searchContent 查询关键字
* @return 分词结果
*/
private List<String> getAnalyze(String searchContent) {
List<String> tokens = new ArrayList<>();
if (StringUtils.isNotEmpty(searchContent)) {
String endpoint = "/_analyze";
String body = "{\n" +
" \"analyzer\": \"ik_smart\",\n" +
" \"text\": \"" + searchContent + "\"\n" +
"}";
try {
Request request = new Request("POST", endpoint);
request.setJsonEntity(body);
Response response = restHighLevelClient.getLowLevelClient().performRequest(request);
InputStream content = response.getEntity().getContent();
JsonNode jsonNode = objectMapper.readTree(content);
if (jsonNode.has("tokens")) {
for (JsonNode token : jsonNode.get("tokens")) {
tokens.add(token.get("token").asText());
}
}
} catch (IOException | UnsupportedOperationException e) {
log.error("ES查询分词异常", e);
}
}
return tokens;
}
2.根据获取到的多个分词数据。替换全文内容。
/**
* 根据多个需要替换的字符,高效替换全文数据
* @param replaceStrList 替换字符
* @param content 全文
* @return 高亮显示的全文
*/
private String replaceHighlight(List<String> replaceStrList, String content) {
StringBuffer result = new StringBuffer();
try {
Map<String, String> replacements = new HashMap<>();
for (String replaceStr : replaceStrList) {
replacements.put(replaceStr, "<font class='eslight'>" + replaceStr + "</font>");
}
Pattern pattern = Pattern.compile(String.join("|", replacements.keySet()));
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
matcher.appendReplacement(result, replacements.get(matcher.group(0)));
}
matcher.appendTail(result);
} catch (Exception e) {
log.error("替换高亮显示异常", e);
}
return result.toString();
}
此时就能将全文关键词以分词的效果高亮显示了。