一、正则提取文本指定数据
需要对一个json结构做数据的提取,提取label和value的值,组成新的结构,西瓜:0、苹果:1、草莓:2
原始json字符串如下格式
[
{
"label": "西瓜",
"value": 0
},
{
"label": "苹果",
"value": 1
},
{
"label": "草莓",
"value": 2
},
{
"label": "柚子",
"value": 3
},
{
"label": "葡萄",
"value": 4
}
]
正则表达式
"label":\s"(.*)",\s+"value":\s(\d+)
结合正则的分组模式$1、$2可以提取,但是在实际java开发中,字符串格式发生了变化,字符串多了空格和换行等,因此需要再完善正则表达式。
该json字符串出现的变种如下:
变种1
[{"label":"西瓜","value":0},{"label":"苹果","value":1},{"label":"草莓","value":2},{"label":"柚子","value":3},{"label":"葡萄","value":4}]
变种2
[ { "label": "西瓜", "value": 0 }, { "label": "苹果", "value": 1 }, { "label": "草莓", "value": 2 }, { "label": "柚子", "value": 3 }, { "label": "葡萄", "value": 4 }]
新的正则写法
"label":"(.{1,2})","value":(\d{1,2})
只能匹配变种1,原始json字符串和变种2不适应,
最终调试改为如下正则写法,三种情况的json字符串都可以匹配。
"label":\s?"(.{1,2})",\s*"value":\s?(\d{1,2})
二、正则表达式详解
三、java正则表达式提取结果
import lombok.extern.slf4j.Slf4j;
import org.junit.Test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
@Slf4j
public class RegularExpressionTest {
/**
* 提取label和value的值
* <p>
* [
* {
* "label": "西瓜",
* "value": 0
* },
* {
* "label": "苹果",
* "value": 1
* },
* {
* "label": "草莓",
* "value": 2
* },
* {
* "label": "柚子",
* "value": 3
* },
* {
* "label": "葡萄",
* "value": 4
* }
* ]
* <p>
* 因为json字符串在不同的工具复制来复制去,出现不同的格式,例如空格和换行,这里列举在实际处理中遇到的2终变种格式
* <p>
* [{"label":"西瓜","value":0},{"label":"苹果","value":1},{"label":"草莓","value":2},{"label":"柚子","value":3},{"label":"葡萄","value":4}]
* <p>
* [ { "label": "西瓜", "value": 0 }, { "label": "苹果", "value": 1 }, { "label": "草莓", "value": 2 }, { "label": "柚子", "value": 3 }, { "label": "葡萄", "value": 4 }]
*
* @param
* @return void
* @author fangyunhe
* @time 2023/9/20 10:40
**/
@Test
public void test() {
String jsonStr = "[\n" +
" {\n" +
" \"label\": \"西瓜\",\n" +
" \"value\": 0\n" +
" },\n" +
" {\n" +
" \"label\": \"苹果\",\n" +
" \"value\": 1\n" +
" },\n" +
" {\n" +
" \"label\": \"草莓\",\n" +
" \"value\": 2\n" +
" },\n" +
" {\n" +
" \"label\": \"柚子\",\n" +
" \"value\": 3\n" +
" },\n" +
" {\n" +
" \"label\": \"葡萄\",\n" +
" \"value\": 4\n" +
" }\n" +
"]";
//String jsonStr = "[{\"label\":\"西瓜\",\"value\":0},{\"label\":\"苹果\",\"value\":1},{\"label\":\"草莓\",\"value\":2},{\"label\":\"柚子\",\"value\":3},{\"label\":\"葡萄\",\"value\":4}]";
//String jsonStr = "[ { \"label\": \"西瓜\", \"value\": 0 }, { \"label\": \"苹果\", \"value\": 1 }, { \"label\": \"草莓\", \"value\": 2 }, { \"label\": \"柚子\", \"value\": 3 }, { \"label\": \"葡萄\", \"value\": 4 }]";
String patternStr = "\"label\":\\s?\"(.{1,2})\",\\s*\"value\":\\s?(\\d{1,2})";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(jsonStr);
while (matcher.find()) {
log.info(matcher.group(1) + ":" + matcher.group(2));
}
}
}
执行结果
17:08:50.684 [main] INFO demo.test.RegularExpressionTest - 西瓜:0
17:08:50.890 [main] INFO demo.test.RegularExpressionTest - 苹果:1
17:08:51.453 [main] INFO demo.test.RegularExpressionTest - 草莓:2
17:08:51.838 [main] INFO demo.test.RegularExpressionTest - 柚子:3
17:08:52.406 [main] INFO demo.test.RegularExpressionTest - 葡萄:4