简单的动态带特殊符号敏感词校验
敏感词之前进行了简单了解,使用结巴分词自带词库可以实现,具体参考我的如下博文
敏感词校验
此次在此基础进行了部分优化,优化过程本人简单记录一下,具体优化改造步骤如下所示
1.需求
我们公司需要添加一个敏感词功能,之前进行了简单的调研,使用结巴分词,然后加载自定义分词,即可实现。最近发现公司的敏感词总是需要修改,所以要求实现一个可以随时修改的动态敏感词功能,并且敏感词需要支持常用标点符号的分词,要求既加即用,所以我在原有基础上进行了简单修改,实现如下。
2. 具体实现
主要的简化实现步骤如下:
-
添加结巴分词的pom信息,以及对应的初始化字典信息准备
-
读取配置文件,进行数据信息的初始化,添加字典标点符号支持,加载数据字典信息
项目字典加载顺序是首先从本地resource目录下获取/dict/custom.dict
文件中获取敏感词信息,如果数据库没有初始化,则加载信息到数据库中,如果数据加载完毕则跳过此步骤。之后则生成一个临时文件,添加敏感词数据字典信息设定词频,然后将此文件内容加载到结巴分词的词库中,最后删除文件
custom.dict 文件格式如下:
-
数据字典的增删改查,当数据字典有变动之后,需要重新加载数据字典
-
服务为微服务,当修改单节点服务时,需要通知其他节点服务,字典同步更新
项目的整体目录结构如下所示:
2.1 添加pom
我们使用的spring boot 项目,新增pom信息如下:
<!-- 敏感词的包 -->
<dependency>
<groupId>com.huaban</groupId>
<artifactId>jieba-analysis</artifactId>
<version>1.0.2</version>
</dependency>
2.2 项目初始化加载字典
-
新增对应的字典表结构
DROP TABLE MANAGE.TB_SYS_SENSITIVE_WORDS --TbSysSensitiveWords CREATE TABLE MANAGE.TB_SYS_SENSITIVE_WORDS ( ID varchar(32), SENSITIVE_WORD varchar(100), SENSITIVE_EXCHANGE_WORD varchar(100), WORD_FREQUENCY number, WORD_DESC varchar(200), ctime date DEFAULT sysdate, mtime date DEFAULT sysdate, is_del varchar(1) DEFAULT 0, primary key(ID) ) COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.ID IS '主键'; COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.SENSITIVE_WORD IS '敏感词'; COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.SENSITIVE_EXCHANGE_WORD IS '特殊符号转义后的敏感词'; COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.WORD_FREQUENCY IS '词频,注意如果敏感词不好用,可以加大词频,使其生效,默认50'; COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.WORD_DESC IS '词语备注描述'; COMMENT ON TABLE MANAGE.TB_SYS_SENSITIVE_WORDS IS '敏感词词库表';
-
新增初始化执行方法
package cn.git.manage.init; import cn.git.manage.util.CommonAnalyzerUtil; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import javax.annotation.PostConstruct; /** * @description: 数据字典初始化 * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Component public class CommonAnalyzerInit { @Autowired private CommonAnalyzerUtil analyzerUtil; /** * 初始化加载自定义分词词典 * * idea测试环境可用,linux分词加载自定义字典,需要读取jar包中文件内容,spring boot 打包运行后,无法直接读取,获取Path对象 * 所以复制一份临时文件到本地,再加载 */ @PostConstruct public void init() { analyzerUtil.analyzerInit(); } }
-
新增 CommonAnalyzerUtil 工具类
具体的加载字典以及特殊标点符号的转义都在此方法中package cn.git.manage.util; import cn.git.common.exception.ServiceException; import cn.git.common.util.LogUtil; import cn.git.common.util.ServerIpUtil; import cn.git.elk.util.NetUtil; import cn.git.manage.entity.TbSysSensitiveWords; import cn.git.manage.mapper.TbSysSensitiveWordsMapper; import cn.hutool.core.util.IdUtil; import cn.hutool.core.util.ObjectUtil; import cn.hutool.core.util.StrUtil; import cn.hutool.http.HttpUtil; import com.huaban.analysis.jieba.WordDictionary; import lombok.extern.slf4j.Slf4j; import org.apache.ibatis.session.ExecutorType; import org.apache.ibatis.session.SqlSession; import org.apache.ibatis.session.SqlSessionFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import java.io.*; import java.nio.charset.StandardCharsets; import java.util.*; /** * @description: 数据字典初通用方法 * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Slf4j @Component public class CommonAnalyzerUtil { @Autowired private TbSysSensitiveWordsMapper sensitiveWordsMapper; /** * 敏感词集合 */ public static Set<String> SENSITIVE_WORDS_SET = new HashSet<>(); /** * 自定义词典路径 */ private static final String DICT_PATH = "/dict/custom.dict"; /** * 临时文件名称 */ private static final String TEMP_FILE_NAME = "custom_tmp.dict"; /** * 系统标识win系统 */ private static final String WINDOWS_SYS = "windows"; /** * 系统标识属性 */ private static final String OS_FLAG = "os.name"; /** * 当前项目路径 */ private static final String USER_DIR = "user.dir"; @Autowired private ServerIpUtil serverIpUtil; @Autowired private SqlSessionFactory sqlSessionFactory; /** * 特殊符号集合 */ private static final Set<Character> SPECIAL_SYMBOLS = new HashSet<>(); /** * 敏感词替换以及还原MAP */ private static final Map<Character, String> REPLACEMENTS = new HashMap<>(); private static final Map<String, Character> REVERSE_REPLACEMENTS = new HashMap<>(); /** * 静态代码块,初始化特殊符号集合 */ static { // 敏感词特殊符号集合 SPECIAL_SYMBOLS.add('|'); SPECIAL_SYMBOLS.add('?'); SPECIAL_SYMBOLS.add('#'); SPECIAL_SYMBOLS.add('?'); SPECIAL_SYMBOLS.add('*'); SPECIAL_SYMBOLS.add('$'); SPECIAL_SYMBOLS.add('^'); SPECIAL_SYMBOLS.add('&'); SPECIAL_SYMBOLS.add('('); SPECIAL_SYMBOLS.add(')'); SPECIAL_SYMBOLS.add('('); SPECIAL_SYMBOLS.add(')'); SPECIAL_SYMBOLS.add('{'); SPECIAL_SYMBOLS.add('}'); SPECIAL_SYMBOLS.add('【'); SPECIAL_SYMBOLS.add('】'); SPECIAL_SYMBOLS.add('['); SPECIAL_SYMBOLS.add(']'); SPECIAL_SYMBOLS.add('"'); SPECIAL_SYMBOLS.add('\''); SPECIAL_SYMBOLS.add(';'); SPECIAL_SYMBOLS.add(':'); SPECIAL_SYMBOLS.add('!'); SPECIAL_SYMBOLS.add('!'); SPECIAL_SYMBOLS.add(','); SPECIAL_SYMBOLS.add(','); SPECIAL_SYMBOLS.add('.'); SPECIAL_SYMBOLS.add('<'); SPECIAL_SYMBOLS.add('>'); SPECIAL_SYMBOLS.add('《'); SPECIAL_SYMBOLS.add('》'); SPECIAL_SYMBOLS.add('%'); SPECIAL_SYMBOLS.add('@'); SPECIAL_SYMBOLS.add('~'); SPECIAL_SYMBOLS.add('='); SPECIAL_SYMBOLS.add('_'); SPECIAL_SYMBOLS.add(' '); SPECIAL_SYMBOLS.add('\\'); SPECIAL_SYMBOLS.add('+'); SPECIAL_SYMBOLS.add('-'); SPECIAL_SYMBOLS.add('/'); // 敏感词替换以及还原MAP初始化 REPLACEMENTS.put('|', "竖线"); REPLACEMENTS.put('?', "问号"); REPLACEMENTS.put('?', "中文问号"); REPLACEMENTS.put('!', "中文感叹号"); REPLACEMENTS.put('!', "感叹号"); REPLACEMENTS.put('*', "星号"); REPLACEMENTS.put('$', "美元"); REPLACEMENTS.put('^', "尖号"); REPLACEMENTS.put('\\', "反斜线"); REPLACEMENTS.put('/', "斜线"); REPLACEMENTS.put('&', "与"); REPLACEMENTS.put('(', "中文左括号"); REPLACEMENTS.put(')', "中文右括号"); REPLACEMENTS.put('(', "左括号"); REPLACEMENTS.put(')', "右括号"); REPLACEMENTS.put('{', "左大括号"); REPLACEMENTS.put('}', "右大括号"); REPLACEMENTS.put('【', "中文左中括号"); REPLACEMENTS.put('】', "中文右中括号"); REPLACEMENTS.put('[', "左中括号"); REPLACEMENTS.put(']', "右中括号"); REPLACEMENTS.put('"', "双引号"); REPLACEMENTS.put('\'', "单引号"); REPLACEMENTS.put(';', "分号"); REPLACEMENTS.put(':', "冒号"); REPLACEMENTS.put(',', "逗号"); REPLACEMENTS.put(',', "中文逗号"); REPLACEMENTS.put('.', "点"); REPLACEMENTS.put('<', "左尖括号"); REPLACEMENTS.put('>', "右尖括号"); REPLACEMENTS.put('《', "中文左尖括号"); REPLACEMENTS.put('》', "中文右尖括号"); REPLACEMENTS.put('%', "百分号"); REPLACEMENTS.put('@', "AT"); REPLACEMENTS.put('#', "井号"); REPLACEMENTS.put('~', "波浪号"); REPLACEMENTS.put('=', "等号"); REPLACEMENTS.put(' ', "空格"); REPLACEMENTS.put('+', "加号"); REPLACEMENTS.put('-', "减号"); REPLACEMENTS.put('_', "下划线"); // 构建逆向映射 for (Map.Entry<Character, String> entry : REPLACEMENTS.entrySet()) { REVERSE_REPLACEMENTS.put(entry.getValue(), entry.getKey()); } } /** * 初始化敏感词 */ public void analyzerInit() { log.info("开始执行analyzerInit!"); // 获取全部敏感词信息 List<TbSysSensitiveWords> sensitiveWordsList = sensitiveWordsMapper.selectList(null); // 敏感词数据信息判空,如果为空则加载dict文件信息到数据库中 if (ObjectUtil.isEmpty(sensitiveWordsList)) { // 读取配置文件流,并且生成一个临时文件 InputStream dictInputStream = this.getClass().getResourceAsStream(DICT_PATH); SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH); if (dictInputStream != null) { try (InputStream inputStream = dictInputStream; BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8))) { // 获取SqlSession,并开启批量执行模式 TbSysSensitiveWordsMapper sensitiveWordsMapper = session.getMapper(TbSysSensitiveWordsMapper.class); // 读取文件数据写入到数据库中 String line; while ((line = bufferedReader.readLine()) != null) { // 考虑到敏感词中间有空格 TbSysSensitiveWords tbSysSensitiveWords = new TbSysSensitiveWords(); tbSysSensitiveWords.setId(IdUtil.simpleUUID()); // 变成小写字母,并且去除空格信息 tbSysSensitiveWords.setSensitiveWord(line.toLowerCase().replaceAll(StrUtil.SPACE, StrUtil.EMPTY)); // 当前词频默认设置50 tbSysSensitiveWords.setWordFrequency(50); if (checkSpecialSymbol(line)) { tbSysSensitiveWords.setSensitiveExchangeWord(exchangeSensitiveWord(line)); } tbSysSensitiveWords.setWordDesc(line); sensitiveWordsMapper.insert(tbSysSensitiveWords); } session.commit(); } catch (IOException e) { throw new RuntimeException(e); } finally { // 确保 SqlSession 关闭 if (session != null) { session.close(); } } } else { throw new ServiceException(StrUtil.format("获取文件[{}]失败,请确认文件存在!", DICT_PATH)); } } // 插入之后再次查询数据库中数据信 if (ObjectUtil.isEmpty(sensitiveWordsList)) { sensitiveWordsList = sensitiveWordsMapper.selectList(null); } // 在工作目录中生成一个custom.dict临时文件文件,用于加载到自定义字典中 String userDir = System.getProperty(USER_DIR); String tempFilePath = userDir.concat(File.separator).concat(TEMP_FILE_NAME); // 判空,开始生成临时文件,并且写入 try (BufferedWriter writer = new BufferedWriter(new FileWriter(tempFilePath))) { // 获取敏感词数量 int size = sensitiveWordsList.size(); // 循环写入临时文件 for (int i = 0; i < size; i++) { // 获取单个敏感词数据信息 TbSysSensitiveWords sysSensitiveWord = sensitiveWordsList.get(i); // 空值判定 if (ObjectUtil.isNull(sysSensitiveWord.getSensitiveWord()) || ObjectUtil.isNull(sysSensitiveWord.getWordFrequency())) { throw new ServiceException("初始化敏感词表部分信息为空,请确认信息是否完整!"); } // 大小写转换为小写,去除空格信息 String sensitiveExchangeWord = sysSensitiveWord.getSensitiveExchangeWord(); // 注意,如果敏感词不生效,可以加大词频,使其生效 Integer wordFrequency = sysSensitiveWord.getWordFrequency(); // 添加敏感词到敏感词集合中,如果包含特殊符号,则替换为转义后的敏感词信息 String line; if (StrUtil.isNotBlank(sensitiveExchangeWord)) { SENSITIVE_WORDS_SET.add(sensitiveExchangeWord); line = sensitiveExchangeWord.concat(StrUtil.SPACE).concat(wordFrequency.toString()); } else { SENSITIVE_WORDS_SET.add(sysSensitiveWord.getSensitiveWord()); line = sysSensitiveWord.getSensitiveWord().concat(StrUtil.SPACE).concat(wordFrequency.toString()); } writer.write(line); // 如果不是最后一个元素,则添加换行符 if (i < size - 1) { writer.newLine(); } } } catch (IOException e) { // 异常信息打印 String errorMessage = LogUtil.getStackTraceInfo(e); throw new ServiceException(StrUtil.format("自定义词典文件写入异常,异常信息为:{}", errorMessage)); } // 删除临时文件 File dictTempFile = new File(tempFilePath); if (dictTempFile.exists()) { log.info("开始加载敏感词信息!"); // 加载自定义的词典进词库 WordDictionary.getInstance().loadUserDict(dictTempFile.toPath()); log.info("加载敏感词信息完毕!"); // 删除临时文件 boolean delete = dictTempFile.delete(); if (delete) { log.info("删除临时文件成功!"); } else { log.info("删除临时文件失败!"); } } else { throw new ServiceException("自定义词典文件不存在,请检查确认!"); } } /** * 重置分词器 */ public void sendResetSignal() { // 获取所有服务ip List<String> ipList = serverIpUtil.getServerIpListByName("management-server"); // 去除本机服务ip String localIp = NetUtil.getLocalIp(); if (StrUtil.isNotBlank(localIp)) { ipList.remove(localIp); } log.info("发送重置信号到服务ip:{},去除本机ip为[{}]", String.join(StrUtil.COMMA, ipList), localIp); // 循环发送请求到management服务 ipList.forEach(ip -> { new Thread(() -> { // 请求路径信息 String uri = "http://".concat(ip).concat(":").concat("11102").concat("/manage/analyzer/reset"); HttpUtil.get(uri, 30000); }).start(); }); } /** * 判断字符串中是否包含特殊符号 * * @return */ public boolean checkSpecialSymbol(String content) { // 参数校验 if (StrUtil.isBlank(content)) { throw new ServiceException("校验字符串是否包含特殊符号参数为空,请检查参数是否正确!"); } // 循环判断是否包含特殊符号 for (char symbol : SPECIAL_SYMBOLS) { if (content.indexOf(symbol) != -1) { return true; } } return false; } /** * 敏感词替换 * * @return */ public String exchangeSensitiveWord(String content) { // 遍历字符串,替换特殊字符 StringBuilder result = new StringBuilder(); for (char c : content.toCharArray()) { if (REPLACEMENTS.containsKey(c)) { result.append(REPLACEMENTS.get(c)); } else { result.append(c); } } return result.toString(); } /** * 敏感词还原 * * @param content 替换后的内容 * @return 还原后的内容 */ public String restoreSensitiveWord(String content) { StringBuilder result = new StringBuilder(); int i = 0; // 遍历替换后的字符串,还原敏感词 while (i < content.length()) { // 尝试找到匹配的替换字符串 char currentChar = content.charAt(i); boolean foundReplacement = false; // 遍历替换字符串,找到匹配项 for (Map.Entry<String, Character> entry : REVERSE_REPLACEMENTS.entrySet()) { // 获取替换字符串和替换字符 String replacement = entry.getKey(); // 判断是否以替换字符串开头 if (content.startsWith(replacement, i)) { // 添加替换字符到结果中,并更新索引 result.append(entry.getValue()); i += replacement.length(); foundReplacement = true; break; } } // 如果没有找到匹配项,则添加当前字符到结果中,并更新索引 if (!foundReplacement) { result.append(currentChar); i++; } } return result.toString(); } }
2.3 数据字典的增删改查
增删改查部分就是基础的表处理,主要添加controller,service信息,还有敏感词的校验analyzerCheck
方法,具体添加的实现类如下:
-
AnalyzerController
package cn.git.manage.controller; import cn.git.common.result.Result; import cn.git.manage.dto.CommonAnalyzerDTO; import cn.git.manage.service.AnalyzerService; import cn.git.manage.vo.analyzer.AnalyzerAddInVO; import cn.git.manage.vo.analyzer.AnalyzerCheckInVO; import cn.git.manage.vo.analyzer.AnalyzerPageInVO; import cn.git.manage.vo.analyzer.AnalyzerPageOutVO; import io.swagger.annotations.*; import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.*; import javax.validation.Valid; /** * @description: 敏感词操作controller * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Slf4j @Api(tags = "系统管理=>系统管理=>敏感词管理") @RestController @RequestMapping("/manage") public class AnalyzerController { @Autowired private AnalyzerService analyzerService; /** * 添加敏感词 * @param analyzerAddInVO * @return */ @ApiOperation(value = "添加敏感词",notes = "添加敏感词") @ApiResponses({@ApiResponse(code = 1, message = "OK", response = Result.class), @ApiResponse(code = -1, message = "error", response = Result.class)}) @PostMapping("/analyzer/add") public Result<String> addSensitiveWords( @ApiParam(name = "analyzerAddInVO", value = "敏感词分词器分词inVO", required = true) @RequestBody @Valid AnalyzerAddInVO analyzerAddInVO){ // 参数转换 CommonAnalyzerDTO commonAnalyzerDTO = new CommonAnalyzerDTO(); commonAnalyzerDTO.setSensitiveWord(analyzerAddInVO.getSensitiveWord()); commonAnalyzerDTO.setWordFrequency(analyzerAddInVO.getWordFrequency()); commonAnalyzerDTO.setWordDesc(analyzerAddInVO.getWordDesc()); analyzerService.addSensitiveWords(commonAnalyzerDTO); return Result.ok("数据修改成功,数据字典加载需要半分钟,敏感词半分钟后生效!"); } /** * 删除敏感词 * @param id * @return */ @ApiOperation(value = "删除敏感词",notes = "删除敏感词") @ApiResponses({@ApiResponse(code = 1, message = "OK", response = Result.class), @ApiResponse(code = -1, message = "error", response = Result.class)}) @GetMapping("/analyzer/delete/{id}") public Result<String> deleteSensitiveWords(@PathVariable("id") String id){ analyzerService.deleteSensitiveWordById(id); return Result.ok("删除成功!"); } /** * 分页查询敏感词 * * @param analyzerPageInVO * @return */ @ApiOperation(value = "分页查询敏感词",notes = "分页查询敏感词") @ApiResponses({@ApiResponse(code = 1, message = "OK", response = Result.class), @ApiResponse(code = -1, message = "error", response = Result.class)}) @PostMapping("/analyzer/page") public Result<AnalyzerPageOutVO> getAnalyzerPageBean( @ApiParam(name = "commonAnalyzerDTO", value = "分页查询敏感词inVO", required = true) @RequestBody AnalyzerPageInVO analyzerPageInVO){ // 传递参数转换 CommonAnalyzerDTO commonAnalyzerDTO = new CommonAnalyzerDTO(); commonAnalyzerDTO.setSensitiveWord(analyzerPageInVO.getSensitiveWord()); commonAnalyzerDTO.setWordDesc(analyzerPageInVO.getWordDesc()); // 分页查询 CommonAnalyzerDTO analyzerPageBean = analyzerService.getAnalyzerPageBean(commonAnalyzerDTO); AnalyzerPageOutVO outVO = new AnalyzerPageOutVO(); outVO.setPageBean(analyzerPageBean.getPageBean()); return Result.ok(outVO); } /** * 敏感词校验 * * @param analyzerCheckInVO * @return */ @ApiOperation(value = "敏感词校验",notes = "敏感词校验") @ApiResponses({@ApiResponse(code = 1, message = "OK", response = Result.class), @ApiResponse(code = -1, message = "error", response = Result.class)}) @PostMapping("/analyzer/check") public Result analyzerCheck(@RequestBody AnalyzerCheckInVO analyzerCheckInVO) { analyzerService.checkAnalyzer(analyzerCheckInVO); return Result.ok("校验通过!"); } /** * 重置分词 * * @return */ @ApiOperation(value = "重置分词",notes = "重置分词") @GetMapping("/analyzer/reset") public Result<String> reset(){ analyzerService.resetAnalyzer(); return Result.ok("重置成功!"); } }
-
AnalyzerService
package cn.git.manage.service; import cn.git.manage.dto.CommonAnalyzerDTO; import cn.git.manage.vo.analyzer.AnalyzerCheckInVO; /** * @description: 敏感词操作service * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ public interface AnalyzerService { /** * 添加敏感词 * @param commonAnalyzerDTO */ void addSensitiveWords(CommonAnalyzerDTO commonAnalyzerDTO); /** * 根据id删除敏感词 * @param id */ void deleteSensitiveWordById(String id); /** * 分页查询敏感词 * @param commonAnalyzerDTO * @return */ CommonAnalyzerDTO getAnalyzerPageBean(CommonAnalyzerDTO commonAnalyzerDTO); /** * 敏感词校验 * * @param analyzerCheckInVO */ void checkAnalyzer(AnalyzerCheckInVO analyzerCheckInVO); /** * 重置分词器 */ void resetAnalyzer(); }
-
AnalyzerServiceImpl
package cn.git.manage.service.impl; import cn.git.common.exception.ServiceException; import cn.git.common.page.CustomPageUtil; import cn.git.common.page.PageBean; import cn.git.common.page.PaginationContext; import cn.git.manage.dto.CommonAnalyzerDTO; import cn.git.manage.entity.TbSysSensitiveWords; import cn.git.manage.mapper.TbSysSensitiveWordsMapper; import cn.git.manage.service.AnalyzerService; import cn.git.manage.util.CommonAnalyzerUtil; import cn.git.manage.vo.analyzer.AnalyzerCheckEntity; import cn.git.manage.vo.analyzer.AnalyzerCheckInVO; import cn.hutool.core.text.StrBuilder; import cn.hutool.core.util.IdUtil; import cn.hutool.core.util.StrUtil; import com.alibaba.fastjson.JSONObject; import com.baomidou.mybatisplus.core.conditions.query.QueryWrapper; import com.huaban.analysis.jieba.JiebaSegmenter; import com.huaban.analysis.jieba.SegToken; import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; import java.util.List; import java.util.stream.Collectors; /** * @description: 敏感词操作service 实现类 * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Slf4j @Service public class AnalyzerServiceImpl implements AnalyzerService { @Autowired private TbSysSensitiveWordsMapper sensitiveWordsMapper; @Autowired private CommonAnalyzerUtil commonAnalyzerUtil; @Autowired private CustomPageUtil customPageUtil; /** * 敏感词校验 * * @param analyzerCheckInVO */ @Override public void checkAnalyzer(AnalyzerCheckInVO analyzerCheckInVO) { // 获取敏感词信息,校验传递参数是否正确 List<AnalyzerCheckEntity> checkEntityList = analyzerCheckInVO.getCheckEntityList(); // 创建分词对象 JiebaSegmenter jiebaSegmenter = new JiebaSegmenter(); // 开始校验 StrBuilder errorMessageBuilder = new StrBuilder(); for (AnalyzerCheckEntity entity: checkEntityList) { // 转换为小写字符串 String lowerCaseStr = entity.getCheckContent().replace(StrUtil.SPACE, StrUtil.EMPTY).toLowerCase(); // 判定是否有特殊字符串,如果有则进行字符串转换 boolean ifSpecialSymbol = commonAnalyzerUtil.checkSpecialSymbol(lowerCaseStr); if (ifSpecialSymbol) { lowerCaseStr = commonAnalyzerUtil.exchangeSensitiveWord(lowerCaseStr); } // 进行分词 List<SegToken> segTokenList = jiebaSegmenter.process(lowerCaseStr, JiebaSegmenter.SegMode.INDEX); log.info("[{}]分词结果为 : {}", entity.getCheckContentDesc(), JSONObject.toJSONString(segTokenList.stream().map(word -> word.word).collect(Collectors.toList()))); String uncheckWord = ""; for (SegToken segToken : segTokenList) { if (CommonAnalyzerUtil.SENSITIVE_WORDS_SET.contains(segToken.word)) { // 重新转义成员字符串信息 if (ifSpecialSymbol) { uncheckWord = uncheckWord.concat(commonAnalyzerUtil.restoreSensitiveWord(segToken.word)).concat(StrUtil.COMMA); } else { uncheckWord = uncheckWord.concat(segToken.word).concat(StrUtil.COMMA); } } } if (StrUtil.isNotBlank(uncheckWord)) { // uncheckWord去除最后一个逗号 uncheckWord = uncheckWord.substring(0, uncheckWord.length() - 1); String errorMessage = StrUtil.format("[{}]包含敏感词[{}]\n", entity.getCheckContentDesc(), uncheckWord); errorMessageBuilder.append(errorMessage); } } // 校验结果展示 if (StrUtil.isNotBlank(errorMessageBuilder.toString())) { throw new ServiceException(errorMessageBuilder.toString()); } } /** * 根据id删除敏感词 * * @param id */ @Override public void deleteSensitiveWordById(String id) { // 删除敏感词 int delNum = sensitiveWordsMapper.deleteById(id); if (delNum > 0) { log.info("通过id[{}]删除敏感词成功,重新加载敏感词信息", id); new Thread(() -> { commonAnalyzerUtil.analyzerInit(); commonAnalyzerUtil.sendResetSignal(); }).start(); } } /** * 添加敏感词 * * @param commonAnalyzerDTO */ @Override public void addSensitiveWords(CommonAnalyzerDTO commonAnalyzerDTO) { // 校验是否包含特殊符号敏感词 boolean ifSpecialSymbol = commonAnalyzerUtil.checkSpecialSymbol(commonAnalyzerDTO.getSensitiveWord()); // 设置敏感词信息并且添加 TbSysSensitiveWords tbSysSensitiveWords = new TbSysSensitiveWords(); tbSysSensitiveWords.setSensitiveWord(commonAnalyzerDTO.getSensitiveWord()); tbSysSensitiveWords.setWordFrequency(commonAnalyzerDTO.getWordFrequency()); // 判定是否有特殊符号 if (ifSpecialSymbol) { // 如果特殊符号则进行转义 tbSysSensitiveWords.setSensitiveExchangeWord(commonAnalyzerUtil.exchangeSensitiveWord(commonAnalyzerDTO.getSensitiveWord())); } tbSysSensitiveWords.setWordDesc(commonAnalyzerDTO.getWordDesc()); tbSysSensitiveWords.setId(IdUtil.simpleUUID()); int insertNum = sensitiveWordsMapper.insert(tbSysSensitiveWords); if (insertNum > 0) { // 开启新线程,异步加载敏感词 new Thread(() -> { // 重新加载敏感词 commonAnalyzerUtil.analyzerInit(); // 发送请求,重置敏感词 commonAnalyzerUtil.sendResetSignal(); }).start(); } } /** * 分页查询敏感词 * * @param commonAnalyzerDTO * @return */ @Override public CommonAnalyzerDTO getAnalyzerPageBean(CommonAnalyzerDTO commonAnalyzerDTO) { // 条件查询 QueryWrapper<TbSysSensitiveWords> wrapper = new QueryWrapper<>(); wrapper.lambda() .like(StrUtil.isNotBlank(commonAnalyzerDTO.getSensitiveWord()), TbSysSensitiveWords::getSensitiveWord, commonAnalyzerDTO.getSensitiveWord()) .like(StrUtil.isNotBlank(commonAnalyzerDTO.getWordDesc()), TbSysSensitiveWords::getWordDesc, commonAnalyzerDTO.getWordDesc()); // 查询list信息 List<TbSysSensitiveWords> sensitiveWordsList = sensitiveWordsMapper.selectList(wrapper); // 进行分页处理 PageBean<TbSysSensitiveWords> pageBean = customPageUtil.setFlowListPage(sensitiveWordsList, PaginationContext.getPageNum(), PaginationContext.getPageSize()); commonAnalyzerDTO.setPageBean(pageBean); return commonAnalyzerDTO; } /** * 重置分词器 */ @Override public void resetAnalyzer() { commonAnalyzerUtil.analyzerInit(); } }
-
参数传递的 VO,DTO,以及数据库实体类
AnalyzerAddInVOpackage cn.git.manage.vo.analyzer; import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; import javax.validation.constraints.NotBlank; import javax.validation.constraints.NotNull; /** * @description: 敏感词分词器分词inVO * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerAddInVO",description = "敏感词分词器分词inVO") public class AnalyzerAddInVO { @NotBlank(message = "敏感词不能为空") @ApiModelProperty(value = "敏感词,必填") private String sensitiveWord; @NotNull(message = "词频不能为空") @ApiModelProperty(value = "词频,必填") private Integer wordFrequency; @ApiModelProperty(value = "备注词语描述,非必填") private String wordDesc; }
AnalyzerCheckEntity
package cn.git.manage.vo.analyzer; import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; import javax.validation.constraints.NotBlank; /** * @description: 校验实体 * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerCheckEntity",description = "敏感词校验实体对象") public class AnalyzerCheckEntity { @NotBlank(message = "校验字段描述信息!") @ApiModelProperty(value = "校验字段描述信息,eg: 贷款用途") private String checkContentDesc; @NotBlank(message = "校验字段详情不能为空!") @ApiModelProperty(value = "校验字段详情") private String checkContent; }
AnalyzerCheckInVO
package cn.git.manage.vo.analyzer; import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; import javax.validation.constraints.NotBlank; import javax.validation.constraints.NotNull; import java.util.List; /** * @description: 敏感词分词器分词inVO * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerCheckInVO",description = "敏感词分词器分词inVO") public class AnalyzerCheckInVO { /** * 校验字段列表 */ @NotNull(message = "校验字段列表不能为空") private List<AnalyzerCheckEntity> checkEntityList; }
AnalyzerPageInVO
package cn.git.manage.vo.analyzer; import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; /** * @description: 敏感词page查询inVO * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerPageInVO",description = "敏感词page查询inVO") public class AnalyzerPageInVO { @ApiModelProperty(value = "敏感词") private String sensitiveWord; @ApiModelProperty(value = "备注词语描述") private String wordDesc; }
AnalyzerPageOutVO
package cn.git.manage.vo.analyzer; import cn.git.common.page.PageBean; import cn.git.manage.entity.TbSysSensitiveWords; import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; import javax.validation.constraints.NotBlank; /** * @description: 敏感词分词器分词inVO * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerAddInVO",description = "敏感词分词器分词inVO") public class AnalyzerPageOutVO { /** * 分页数据 */ PageBean<TbSysSensitiveWords> pageBean; }
数据库实体
TbSysSensitiveWords
package cn.git.manage.entity; import com.baomidou.mybatisplus.annotation.*; import lombok.Data; import lombok.EqualsAndHashCode; import lombok.experimental.Accessors; import java.util.Date; /** * @description: 敏感词词库表 * @program: bank-credit-sy * @author: lixuchun * @create: 2024-12-03 */ @Data @EqualsAndHashCode(callSuper = false) @Accessors(chain = true) @TableName("TB_SYS_SENSITIVE_WORDS") public class TbSysSensitiveWords { /** * 主键id */ @TableId(value = "ID", type = IdType.ASSIGN_ID) private String id; /** * 敏感词 */ @TableField("SENSITIVE_WORD") private String sensitiveWord; /** * 敏感词替换词 */ @TableField("SENSITIVE_EXCHANGE_WORD") private String sensitiveExchangeWord; /** * 词频 */ @TableField("WORD_FREQUENCY") private Integer wordFrequency; /** * 词语备注描述 */ @TableField("WORD_DESC") private String wordDesc; /** * 创建日期 */ @TableField(value = "CTIME", fill = FieldFill.INSERT) private Date ctime; /** * 更新日期 */ @TableField(value = "MTIME", fill = FieldFill.UPDATE) private Date mtime; /** * 删除标识 */ @TableField(value = "IS_DEL") private String isDel; }
2.4 变化通知其他服务
-
ServerIpUtil
我们的项目是一个微服务,每个子模块都有多个节点,我们修改单一节点后需要通知其他节点也进行修改,所以需要获取注册中心的其他模块节点ip
,当敏感词有变动的时候,需要通知其他节点进行更新,所以提供了一个ip
获取的工具类。在修改以及删除的方法中,新增了通知外围服务的方法,并且调用前需要从ip
中删除本机ip
,以免重复调用。package cn.git.common.util; import cn.git.common.exception.ServiceException; import cn.hutool.core.util.ObjectUtil; import cn.hutool.core.util.StrUtil; import com.alibaba.cloud.nacos.NacosDiscoveryProperties; import com.alibaba.cloud.nacos.NacosServiceManager; import com.alibaba.nacos.api.exception.NacosException; import com.alibaba.nacos.api.naming.NamingService; import com.alibaba.nacos.api.naming.pojo.Instance; import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.scheduling.annotation.EnableScheduling; import org.springframework.scheduling.annotation.Scheduled; import org.springframework.stereotype.Component; import java.util.Arrays; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.concurrent.atomic.AtomicInteger; import java.util.stream.Collectors; /** * 通过serverName获取当前可用服务ip * @program: bank-credit-sy * @author: lixuchun * @create: 2022-05-18 */ @Slf4j @Component @EnableScheduling public class ServerIpUtil { /** * 服务列表信息 */ private static Map<String, List<String>> convertServerListMap = new HashMap<>(); @Autowired private NacosDiscoveryProperties discoveryProperties; @Autowired private NacosServiceManager nacosServiceManager; /** * serverName数组 */ private static final String[] SERVER_NAME_ARR = {"uaa-server", "converter-server"}; /** * 获取服务请求数,到限定数值则重置 */ private static final Integer MAX_REQ_COUNT = 1000; /** * 服务最小个数 */ private static final Integer INIT_SERVER_SIZE = 1; /** * 服务器轮询count参数 */ private AtomicInteger serversCount = new AtomicInteger(0); /** * 通过服务名称获取服务可用ip * @return 服务ip */ public String getIpFromServerName(String serverName) { // 通过nacos注册服务名称获取ip端口信息列表,从中选择进行发送 String convertServerIp = null; try { // 服务列表信息 List<Instance> convertServerList; if (ObjectUtil.isEmpty(convertServerListMap.get(serverName))) { log.info("获取全部nacos对应[{}]namingService!", serverName); NamingService configService = nacosServiceManager.getNamingService(discoveryProperties.getNacosProperties()); // 获取服务列表ip 地址 convertServerList = configService.getAllInstances(serverName); log.info("获取服务nacos对应[{}]服务全部在线服务列表信息成功!", serverName); if (ObjectUtil.isNotEmpty(convertServerList)) { List<String> serverIpList = convertServerList.stream().map( Instance::getIp ).collect(Collectors.toList()); convertServerListMap.put(serverName, serverIpList); } } // 筛选ip信息 if (ObjectUtil.isNotEmpty(convertServerListMap.get(serverName))) { Integer selectServerIndex = serversCount.incrementAndGet() % convertServerListMap.get(serverName).size(); convertServerIp = convertServerListMap.get(serverName).get(selectServerIndex); } // 1000一个循环 if (serversCount.get() == MAX_REQ_COUNT) { serversCount.set(0); } } catch (NacosException e) { log.error("通过服务名称[{}]获取服务ip失败!", serverName); e.printStackTrace(); } if (StrUtil.isBlank(convertServerIp)) { throw new RuntimeException(StrUtil.format("通过服务名称[{}]获取服务ip失败!", serverName)); } log.info("通过服务名称[{}]成功获取服务ip[{}]地址!", serverName, convertServerIp); return convertServerIp; } /** * 通过服务名称获取服务可用ip列表 * @return 服务ip */ public List<String> getServerIpListByName(String serverName) { try { // 服务列表信息 List<Instance> convertServerList; NamingService configService = nacosServiceManager.getNamingService(discoveryProperties.getNacosProperties()); // 获取服务列表ip 地址 convertServerList = configService.getAllInstances(serverName); if (ObjectUtil.isNotEmpty(convertServerList)) { return convertServerList.stream().map( Instance::getIp ).collect(Collectors.toList()); } else { throw new ServiceException(StrUtil.format("通过服务名称[{}]获取服务ip为空!", serverName)); } } catch (NacosException e) { throw new ServiceException(StrUtil.format("通过服务名称[{}]获取服务ip失败!", serverName)); } } /** * 定时任务,通过服务名称获取服务信息 * 7点-23点 每20分钟一次 */ @Scheduled(cron = "0 0/20 7-23 * * ?") public void setServerInfoMap() { Arrays.stream(SERVER_NAME_ARR).forEach(serverName -> { if (ObjectUtil.isEmpty(convertServerListMap.get(serverName))) { try { NamingService configService = nacosServiceManager.getNamingService(discoveryProperties.getNacosProperties()); // 获取服务列表ip 地址 List<Instance> convertServerList = configService.getAllInstances(serverName); if (ObjectUtil.isNotEmpty(convertServerList)) { List<String> serverIpList = convertServerList.stream().map( Instance::getIp ).collect(Collectors.toList()); log.info("获取服务[{}]服务serverInfo信息成功,当前服务共[{}]个节点!", serverName, serverIpList.size()); convertServerListMap.put(serverName, serverIpList); } else { log.info("获取服务[{}]服务serverInfo信息失败!", serverName); } } catch (NacosException e) { e.printStackTrace(); } } }); } }
3. 测试
以上步骤执行完毕后,便可以进行测试了,启动本地服务,设定token信息,访问服务swagger测试页面
从后台的敏感词中随机找到一个 巫淫新骚伊宁市
,进行校验
校验接口以及参数如下:
校验结果如下:
在使用添加接口新增一个自定义敏感词,包含特殊符号(你\好|赵老四%,gaga
),并且加入词典:
添加成功,提示信息如下:
再次执行校验,使用新增敏感词 你\好|赵老四%,gaga
执行结果如下所示,发现已经可以识别此敏感词了:
至此敏感词优化结束