1、快速开始
-
- JDK1.7+ - Maven 3.x+
2、Maven 引入
<!-- https://mvnrepository.com/artifact/com.github.houbb/sensitive-word -->
<dependency>
<groupId>com.github.houbb</groupId>
<artifactId>sensitive-word</artifactId>
<version>0.13.1</version>
</dependency>
3、spring接入及自定义敏感词库
定义:许的内容-返回的内容不被当做敏感词 [白名单]
import com.github.houbb.sensitive.word.api.IWordAllow;
import com.google.common.collect.Lists;
import org.springframework.stereotype.Component;
import java.util.List;
@Component
public class MyWordAllow implements IWordAllow {
@Override
public List<String> allow() {
return StreamUtil.readAllLines("/backend_sensitive_word_allow.txt");
}
}
定义:MyWordDeny 拒绝出现的数据-返回的内容被当做是敏感词
import com.github.houbb.heaven.util.io.StreamUtil;
import com.github.houbb.sensitive.word.api.IWordDeny;
import com.google.common.collect.Lists;
import org.springframework.stereotype.Component;
import java.util.List;
@Component
public class MyWordDeny implements IWordDeny {
@Override
public List<String> deny() {
return StreamUtil.readAllLines("/backend_sensitive_word_deny.txt");
}
}
文件位置:
白名单内容如下【backend_sensitive_word_allow.txt】:
duck shit chicken fowl sex sexy prostitute gender
源码:
com.github.houbb.sensitive.word.support.deny.WordDenySystem.deny()
定义配置类:SensitiveWordConfig
import com.github.houbb.sensitive.word.api.IWordAllow;
import com.github.houbb.sensitive.word.api.IWordDeny;
import com.github.houbb.sensitive.word.bs.SensitiveWordBs;
import com.github.houbb.sensitive.word.core.SensitiveWordHelper;
import com.github.houbb.sensitive.word.support.allow.WordAllows;
import com.github.houbb.sensitive.word.support.deny.WordDenys;
import com.github.houbb.sensitive.word.support.ignore.SensitiveWordCharIgnores;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
/**
* @author : qinjinyuan
* @desc : TODO 请填写你的功能描述
* @date : 2024/03/11 10:05
*/
@Configuration
public class SpringSensitiveWordConfig {
@Autowired
private MyWordAllow myDdWordAllow;
@Autowired
private MyWordDeny myDdWordDeny;
/**
* 初始化引导类
*
* @return 初始化引导类
* @since 1.0.0
*/
@Bean
public SensitiveWordBs sensitiveWordBs() {
// 敏感词 = 系统 + 自定义
IWordDeny wordDeny = WordDenys.chains(WordDenys.defaults(), myDdWordDeny);
// 白名单 = 系统 + 自定义
IWordAllow wordAllow = WordAllows.chains(WordAllows.defaults(), myDdWordAllow);
return SensitiveWordBs.newInstance()
.wordAllow(wordAllow)
.wordDeny(wordDeny)
.charIgnore(SensitiveWordCharIgnores.specialChars())
// 各种其他配置
.numCheckLen(8)
.init();
}
}
4、测试:
# 根据敏感词库,进行数据处理
final String text2 = "F#U%C^K fuck gender the fuck bad fuck words.fuck";
SensitiveWordBs sensitiveWordBs = SpringUtils.getBean(SensitiveWordBs.class);
String result = sensitiveWordBs.replace(text2);
System.out.println(result);
# 输出如下
******* **** gender the **** bad **** words*****
5、进阶:与jackson注解配合使用
有了这么好的工具,如何优雅的用在我们的系统中?
定义:反序列化类 (spring默认用jackson)
SensitiveDeserializer
import cn.hutool.core.text.CharSequenceUtil;
import com.fasterxml.jackson.core.JacksonException;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.github.houbb.sensitive.word.bs.SensitiveWordBs;
import com.sikaryofficial.common.core.utils.SpringUtils;
import lombok.extern.slf4j.Slf4j;
import org.springframework.boot.jackson.JsonComponent;
import java.io.IOException;
/**
* @author : qinjinyuan
* @desc : 自定义反序列化器,敏感词处理
* @date : 2023/12/14 9:53
*/
@Slf4j
@JsonComponent
public class SensitiveDeserializer extends JsonDeserializer<String> {
/**
* 反序列化字符串 ,进行敏感词处理
*
* @param jsonParser
* @param deserializationContext
* @return
* @throws IOException
* @throws JacksonException
*/
@Override
public String deserialize(JsonParser jsonParser, DeserializationContext deserializationContext) throws IOException, JacksonException {
if(CharSequenceUtil.isBlank(jsonParser.getText())){
return null;
}
SensitiveWordBs sensitiveWordBs = SpringUtils.getBean(SensitiveWordBs.class);
return sensitiveWordBs.replace(jsonParser.getText());
}
}
使用jackson注解 在需要的request dto属性中添加即可:
@JsonDeserialize(using = SensitiveDeserializer.class)
import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
public class XXXXReq{
@JsonDeserialize(using = SensitiveDeserializer.class)
private String remark;
}
6、小结
敏感词工具,脱敏等,其实有hutool,但是对于效率及灵活度来说,这个开源敏感词工具更实用:
可自定义敏感词库;
可定义白名单;
支持分词查找;
支持跳过特殊字符;
基于 DFA 算法,性能为 7W+ QPS,应用无感;
。。。
作者 老马啸西风 github star 1.9k
各大平台连敏感词库都没有的吗?sensitive-word java 开源敏感词工具入门使用 | Echo Blog
https://github.com/houbb/sensitive-word/blob/master/CHANGE_LOG.md
GitHub - houbb/sensitive-word: 👮♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java 敏感词过滤工具框架。请勿发布涉及政治、广告、营销、翻墙、违反国家法律法规等内容。高性能敏感词检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。)