使用TrieTree(字典树)来实现敏感词过滤
1. 字典树定义
字典树(TrieTree),是一种树形结构,典型应用是用于统计,排序和保存大量的字符串(但不仅限于字符串,如01字典树)。主要思想是利用字符串的公共前缀来节约存储空间。很好地利用了串的公共前缀,节约了存储空间。字典树主要包含两种操作,插入和查找。
字典树具有以下规则:
-1. 根节点不包含字符,其他节点包含一个字符。
-
- 从根节点到某一节点经过的字符连接起来构成一个字符串。如图中的 him 、 her 、 cat 、 no 、 nova。
-
- 一个字符串与 Trie 树中的一条路径对应。
-
- 在实现过程中,会在叶节点中设置一个标志,用来表示该节点是否是一个字符串的结尾,本例中用isEnd标记。
关于字典树的插入、删除等操作,可参考以下文章:
- 在实现过程中,会在叶节点中设置一个标志,用来表示该节点是否是一个字符串的结尾,本例中用isEnd标记。
我来说说我对 Trie 数的理解。
下面是用Java实现的简易的TrieTree字典树
import java.util.HashMap;
import java.util.Map;
public class TrieTree {
public class TrieNode{
public char value;
public int isEnd; //0表示非终结 1表示终结
public Map<Character,TrieNode>children;
public TrieNode(char value,int isEnd){
this.value=value;
this.isEnd=isEnd;
this.children=new HashMap<>();
}
public TrieNode(char value){
this.value=value;
this.isEnd=0;
this.children=new HashMap<>();
}
public TrieNode(){
this.isEnd=0;
this.children=new HashMap<>();
}
}
private TrieNode root;
public TrieTree(){
this.root=new TrieNode();
}
//插入敏感词汇
public void insert(String str){
if(str==null||str.length()==0){
return ;
}
root=insert(root,str,0);
}
//判断字符串中,是否包含敏感词汇
public boolean match(String str){
if (str == null || "".equals(str)) {
return false;
}
TrieNode temp=root;
for(int i=0;i<str.length();i++){
char ch=str.charAt(i);
//获取到下一个节点
TrieNode next = temp.children.get(ch);
if (next==null){
temp=root;
}else{
temp=next;
}
if (temp.isEnd==1){
return true;
}
}
return false;
}
//移除敏感词汇
public void remove(String str){
if (str == null || "".equals(str)) {
return;
}
//没有该敏感词时,直接返回
if (!match(str)){
return;
}
//开始删除敏感词
root=remove(root,str,0);
}
private TrieNode remove(TrieNode t,String str,int index){
char ch=str.charAt(index);
TrieNode child = t.children.get(ch);
//到达最末尾
if (index==str.length()-1){
if (child.children.size()>0){
//当前节点有子节点时,将标记为设置为0即可
child.isEnd=0;
}else{
//否则直接删除该节点
t.children.remove(ch);
}
return t;
}
//往下删除
child=remove(child,str,++index);
//回溯
if (child.children.size()==0&&child.isEnd==1){
//当没有节点并且isEnd==0时
t.children.remove(ch);
}
return t;
}
private TrieNode insert(TrieNode t,String str,int index){
char ch=str.charAt(index);
TrieNode child = t.children.get(ch);
if (child!=null){
if (index==str.length()-1){
child.isEnd=1;
return t;
}
child=insert(child,str,++index);
// t.children.put(ch,child);
return t;
}
child=new TrieNode(ch);
if (index==str.length()-1){
child.isEnd=1;
}else{
child=insert(child,str,++index);
}
t.children.put(ch,child);
return t;
}
public static void main(String[] args) {
String[]sensitive={"华南理工","大学生","泰裤辣"};
TrieTree trieTree=new TrieTree();
for(int i=0;i<sensitive.length;i++){
trieTree.insert(sensitive[i]);
}
System.out.println(trieTree.match("我是华南大学的学生"));
System.out.println(trieTree.match("华北理工大学泰裤"));
System.out.println(trieTree.match("华南理工大学"));
System.out.println(trieTree.match("大学生"));
System.out.println(trieTree.match("大学生泰裤辣"));
System.out.println(trieTree.match("人之初性本善性相近习相远华南大学泰山崩于前而面不改色泰裤辣哈哈哈哈哈哈"));
trieTree.remove("华南理工");
System.out.println(trieTree.match("华南理工大学"));
trieTree.remove("大学生");
System.out.println(trieTree.match("大学生"));
trieTree.remove("泰裤辣");
System.out.println(trieTree.match("人之初性本善性相近习相远华南大学泰山崩于前而面不改色泰裤辣哈哈哈哈哈哈"));
}
}
测试结果如下:
2. 使用字典树实现话题发布时,检查是否有敏感词汇
先创建三个表,分别是m_user用户表,m_topic话题表,m_sensitive敏感词汇表,表的具体内容如下:
表的部分内容如下:
创建一个maven项目,添加下列依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.young</groupId>
<artifactId>trie01</artifactId>
<version>1.0-SNAPSHOT</version>
<parent>
<artifactId>spring-boot-starter-parent</artifactId>
<groupId>org.springframework.boot</groupId>
<version>2.7.0</version>
</parent>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-boot-starter</artifactId>
<version>3.4.3</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.83</version>
</dependency>
</dependencies>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
</properties>
</project>
application.yml
server:
port: 8089
spring:
datasource:
username: root
password: 123456
url: jdbc:mysql://localhost:3306/young?useSSL=false&serverTimezone=UTC
driver: com.mysql.cj.jdbc.Driver
mybatis-plus:
global-config:
db-config:
logic-not-delete-value: 0
logic-delete-value: 1
实体类信息如下图,其中User类中有一个字段isEnabled,我们可以使用这个字段,来约束用户的行为,但用户多次发布含有不当言论的话题时,将用户的isEnable置为0,这里为了方便演示,不实现该功能
相关的mapper
UserMapper.java
package com.young.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.young.entity.User;
import org.apache.ibatis.annotations.Mapper;
@Mapper
public interface UserMapper extends BaseMapper<User> {
}
SensitiveMapper.java
package com.young.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.young.entity.Sensitive;
import org.apache.ibatis.annotations.Mapper;
@Mapper
public interface SensitiveMapper extends BaseMapper<Sensitive> {
}
TopicMapper.java
package com.young.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.young.entity.Topic;
import org.apache.ibatis.annotations.Mapper;
@Mapper
public interface TopicMapper extends BaseMapper<Topic> {
}
UserService.java
package com.young.service;
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.young.entity.User;
import com.young.mapper.UserMapper;
import com.young.vo.UserVO;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
@Service
public class UserService {
@Autowired
private UserMapper userMapper;
public User login(String username,String password){
LambdaQueryWrapper<User>queryWrapper=new LambdaQueryWrapper<>();
queryWrapper.eq(User::getUsername,username)
.eq(User::getPassword,password);
User user = userMapper.selectOne(queryWrapper);
return user;
}
}
TopicService.java
package com.young.service;
import com.young.entity.Topic;
import com.young.exception.BusinessException;
import com.young.mapper.TopicMapper;
import com.young.vo.TrieTree;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
@Service
public class TopicService {
@Autowired
private TopicMapper topicMapper;
@Resource
private TrieTree trieTree;
public boolean saveTopic(Topic topic){
//判断是否有敏感词汇
if (trieTree.match(topic.getTitle())||trieTree.match(topic.getContent())) {
throw new BusinessException("发布内容中存在不当词汇,请遵守相关法律法规,营造良好的网络环境!!!");
}
return topicMapper.insert(topic)>0;
}
}
SensitiveService.java,用于获取数据库中的敏感词汇表
package com.young.service;
import com.young.entity.Sensitive;
import com.young.mapper.SensitiveMapper;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import javax.annotation.PostConstruct;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
@Service
public class SensitiveService {
@Autowired
private SensitiveMapper sensitiveMapper;
public List<Sensitive>getAllSensitive(){
return sensitiveMapper.selectList(null);
}
public List<String>getAllSensitiveWord(){
List<Sensitive> allSensitive = getAllSensitive();
if (allSensitive!=null&&allSensitive.size()>0){
return allSensitive.stream().map(sensitive -> sensitive.getWord()).collect(Collectors.toList());
}
return new ArrayList<>();
}
}
TrieTreeConfig.java,创建TrieTree的相关bean,方便后续使用
package com.young.config;
import com.young.service.SensitiveService;
import com.young.vo.TrieTree;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.List;
@Configuration
public class TrieTreeConfig {
@Autowired
private SensitiveService sensitiveService;
@Bean
public TrieTree constructTrieTree(){
System.out.println("初始化字典树======================");
List<String> words = sensitiveService.getAllSensitiveWord();
TrieTree trieTree=new TrieTree();
for (String word : words) {
trieTree.insert(word);
}
return trieTree;
}
}
BusinessException.java
package com.young.exception;
public class BusinessException extends RuntimeException{
private String msg;
public BusinessException(String msg){
super(msg);
}
}
GlobalExceptionHandler.java
package com.young.exception;
import com.young.util.ResultVOUtil;
import com.young.vo.ResultVO;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.ControllerAdvice;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
@RestControllerAdvice
@Slf4j
public class GlobalExceptionHandler {
@ExceptionHandler(BusinessException.class)
public ResultVO businessExceptionHandler(BusinessException e){
log.error("businessException:{}",e);
return ResultVOUtil.fail(400,e.getMessage());
}
@ExceptionHandler(Exception.class)
public ResultVO exceptionHandler(Exception e){
log.error("exception:{}",e);
return ResultVOUtil.fail(400,e.getMessage());
}
}
相关的vo
ResultVOUtil.java
package com.young.util;
import com.young.vo.ResultVO;
public class ResultVOUtil <T>{
public static <T> ResultVO<T> success(){
return new ResultVO<>(200,"操作成功");
}
public static <T> ResultVO<T> success(T data){
return new ResultVO<>(200,"操作成功",data);
}
public static <T> ResultVO<T> fail(){
return new ResultVO<>(400,"操作失败");
}
public static <T> ResultVO<T> fail(Integer code,String msg){
return new ResultVO<>(code,msg);
}
}
DemoController.java,这里为了方便演示,用了session保存用户信息
package com.young.controller;
import com.young.entity.Topic;
import com.young.entity.User;
import com.young.service.TopicService;
import com.young.service.UserService;
import com.young.util.ResultVOUtil;
import com.young.vo.ResultVO;
import com.young.vo.UserVO;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import javax.servlet.http.HttpServletRequest;
@RestController
@RequestMapping("/user")
public class DemoController {
@Autowired
private UserService userService;
@Autowired
private TopicService topicService;
@PostMapping("/login")
public ResultVO login(@RequestBody UserVO userVO, HttpServletRequest request){
User user = userService.login(userVO.getUsername(), userVO.getPassword());
if (user==null){
return ResultVOUtil.fail(400,"用户名或密码错误");
}
request.getSession().setAttribute("user",user);
return ResultVOUtil.success(user);
}
@PostMapping("/topic")
public ResultVO addTopic(@RequestBody Topic topic,HttpServletRequest request){
User user = (User)request.getSession().getAttribute("user");
if (user==null){
return ResultVOUtil.fail(400,"用户未登录");
}
topic.setUserId(user.getId());
if (topicService.saveTopic(topic)){
return ResultVOUtil.success();
}
return ResultVOUtil.fail(400,"发布话题失败");
}
}
运行项目,登录用户
发布文章(包含敏感词汇)