昨天在用Anki的时候,复习笔记时想在笔记的解析里快速查找内容,于是探索了一下将匹配的搜索结果高亮。开始想不用第三方库直接实现,结果匹配的文本被HTML标签隔断时不能成功匹配,后来用到了jquery的mark.js库才简单实现。事后我想看看人工智能能不能解决不借助已有的库实现那个功能,结果人工智能给我一个页面代码,我用来一试,完全是被忽悠。不过人工智能的代码给了我一个提示——可以将搜索文本分成一个个字符,然后在两个字符中间插入匹配HTML标签的正则表达式,设定正则表达式可以出现非负整数次,就能够成功消除HTML标签的影响了。
于是进行了一番编码和调试,基本功能是实现了,潜在bug则没有测试,代码如下:
<!DOCTYPE html>
<html lang="zh">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Search Highlighter</title>
<style>
.highlight {
background-color: yellow;
}
</style>
</head>
<body>
<input type="text" id="searchInput" value="is">
<button onclick="search()">Search</button>
<div id="textContainer">
This is some sample text to search through. You can replace this with your own content.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed et quam eget ipsum ullamcorper hendrerit.
Donec vitae quam eget quam aliquam tincidunt. Nullam pharetra quam ac quam tincidunt, at semper ipsum
varius. Donec nec enim at libero ullamcorper hendrerit. Sed ac quam eget quam aliquam tincidunt.
Nullam pharetra quam ac quam tincidunt, at semper ipsum varius.
我i<b>s们This is. some bold text.</b>
<i>This is some italic text.</i>
<a href="#">This is a link.</a>
</div>
<script>
function search() {
const searchTerm = document.getElementById('searchInput').value.toLowerCase();
const textContainer = document.getElementById('textContainer');
//清除已有的高亮,其实就是删除成对的高亮格式span标签
textContainer.innerHTML = textContainer.innerHTML.replaceAll(/<span class="highlight">(.+?)<\/span>/gi,'$1');
const text = textContainer.innerHTML;
const eleTag = '(<[^>]+>){0,}';
let sReg = '';
let highlightedHTML = '';
//将查找字符串拆成单个字符,中间插入eleTag,于是可以匹配相连的两个字符中间有0个或多个html标签
searchTerm.split('').forEach(char=>{
//这里应该将JavaScript正则表达式用到的转义字符都转换一遍,这里只处理了两个,肯定存在bug
if(char === ' ') char = '\\s';
else if(char === '.') char = '\\.';
sReg = `${sReg}${char}${eleTag}`;
});
const regex = new RegExp(sReg, 'ig');
let matches;
let pos = 0;//用于记录一次成功匹配后到达的位置
while((matches = regex.exec(text)) !== null){//匹配成功
let tmp = matches[0];
//截取上次匹配处理后至本次匹配之间的HTML片段,准备添加高亮标签
let ret = text.substring(pos, regex.lastIndex);
pos = regex.lastIndex;
/*
* 如果匹配成功的内容里有HTML标签,我们需要在标签前面插入一个span结束标签,
* 标签后面插入一个开始标签,这样才不会破坏原有的HTML文档结构。
*/
const regTag = new RegExp('(<[^>]+>)', 'ig');
tmp = tmp.replaceAll(regTag,'</span>$1<span class="highlight">');
//再在匹配成功的内容最前面加上高亮开始标签,末尾加上高亮结束标签,闭合
tmp = `<span class="highlight">${tmp}</span>`;
//将匹配内容替换为添加了高亮标签的html片段
ret = ret.replace(matches[0],tmp);
//拼接到最终HTML字符串中
highlightedHTML = `${highlightedHTML}${ret}`;
}
//替换原有HTML片段,实现高亮
textContainer.innerHTML = `${highlightedHTML}${text.substring(pos)}`;
}
</script>
</body>
</html>
为什么那上面有那么多英文?因为人工智能给我的就是英文,HTML内容我只插入了两个汉字,然后在原有的<b>标签前后分别插入了一个i和s。运行截图如下:
再搜索一次:
进行一次没有匹配内容的搜索:
反正就是玩玩,就这样吧。人工智能给的原始文件附上:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Search Highlighter</title>
<style>
.highlight {
background-color: yellow;
}
</style>
</head>
<body>
<input type="text" id="searchInput" placeholder="Enter search term">
<button onclick="search()">Search</button>
<div id="textContainer">
This is some sample text to search through. You can replace this with your own content.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed et quam eget ipsum ullamcorper hendrerit.
Donec vitae quam eget quam aliquam tincidunt. Nullam pharetra quam ac quam tincidunt, at semper ipsum
varius. Donec nec enim at libero ullamcorper hendrerit. Sed ac quam eget quam aliquam tincidunt.
Nullam pharetra quam ac quam tincidunt, at semper ipsum varius.
<b>This is some bold text.</b>
<i>This is some italic text.</i>
<a href="#">This is a link.</a>
</div>
<script>
function search() {
const searchTerm = document.getElementById('searchInput').value.toLowerCase();
const textContainer = document.getElementById('textContainer');
const text = textContainer.textContent.toLowerCase();
const regex = new RegExp(`(<[^>]+>|\ )|(${searchTerm})`, 'ig');
let highlightedHTML = '';
const textParts = text.replace(regex, function(match, htmlTag, searchTerm) {
if (htmlTag) {
return htmlTag;
} else {
return `<span class="highlight">${searchTerm}</span>`;
}
}).split(/<\/?span>/);
for (const part of textParts) {
if (part.startsWith('<span')) {
highlightedHTML += `<span class="highlight">${part.slice(6)}</span>`;
} else {
highlightedHTML += part;
}
}
textContainer.innerHTML = highlightedHTML;
}
</script>
</body>
</html>
它根本就不能工作。不过,里面的正则表达式“<[^>]+>”和变量名“textParts”是我最终代码思路的源泉。