在上一篇中我们介绍了DSL相关的知识,接下来我们将会学习elasticsearch的结构化查询,同时也实践一下上一篇的DSL的查询用法
什么是结构化搜索?
从《Elasticsearch权威指南》上摘取部分解释如下:
结构化搜索是指查询包含内部结构的数据。日期,时间,和数字都是结构化的:它们有明确的格式给你执行逻辑
操作。一般包括比较数字或日期的范围,或确定两个值哪个大。文本也可以被结构化。一包蜡笔有不同的颜色:
红色 , 绿色 , 蓝色 。一篇博客可能被打上 分布式 和 搜索 的标签。电子商务产品有商品统一代码(UPCs)
或其他有着严格格式的标识。
通过结构化搜索,你的查询结果始终是是或非;是否应该属于集合。结构化搜索不关心文档的相关性或分数,
它只是简单的包含或排除文档。这必须是有意义的逻辑,一个数字不能比同一个范围中的其他数字 更多。它只能
包含在一个范围中或不在其中。类似的,对于结构化文本,一个值必须相等或不等。这里没有更匹配的概念。
从上面的定义我们可以看出来结构化查询最重要的就是是否匹配么人并不是很关心相关性和分值计算。所以接下来我们将会一一介绍不同的结构化查询。
Term查询
term 主要用于查找精确值,由于不需要计算分值,而且可以被缓存,所以速度很快,而且term查询主要针对的是数字,日期,布尔值或 not_analyzed 的字符串(未经分析的文本数据类型)。
首先我们使用term查询一下:
GET bank/_search
{
"query": {
"term": {
"firstname": {
"value": "Burton"
}
}
},
"profile": "true"
}
返回结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"profile" : {
"shards" : [
{
"id" : "[jW8PbSdhTOOpESX13DRBJQ][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "TermQuery",
"description" : "firstname:Burton",
"time_in_nanos" : 8729,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 2,
"create_weight" : 7274,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 1455
}
}
],
"rewrite_time" : 1222,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 3132
}
]
}
],
"aggregations" : [ ]
}
]
}
}
我们发现结果好像并不尽如人意,那么我们再试一个,这次我们把他修改成年龄字段搜索:
GET bank/_search
{
"query": {
"term": {
"age": {
"value": 36
}
}
},
"profile": "true"
}
返回结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 52,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "361",
"_score" : 1.0,
"_source" : {
"account_number" : 361,
"balance" : 23659,
"firstname" : "Noreen",
"lastname" : "Shelton",
"age" : 36,
"gender" : "M",
"address" : "702 Tillary Street",
"employer" : "Medmex",
"email" : "noreenshelton@medmex.com",
"city" : "Derwood",
"state" : "NH"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "378",
"_score" : 1.0,
"_source" : {
"account_number" : 378,
"balance" : 27100,
"firstname" : "Watson",
"lastname" : "Simpson",
"age" : 36,
"gender" : "F",
"address" : "644 Thomas Street",
"employer" : "Wrapture",
"email" : "watsonsimpson@wrapture.com",
"city" : "Keller",
"state" : "TX"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "397",
"_score" : 1.0,
"_source" : {
"account_number" : 397,
"balance" : 37418,
"firstname" : "Leonard",
"lastname" : "Gray",
"age" : 36,
"gender" : "F",
"address" : "840 Morgan Avenue",
"employer" : "Recritube",
"email" : "leonardgray@recritube.com",
"city" : "Edenburg",
"state" : "AL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "455",
"_score" : 1.0,
"_source" : {
"account_number" : 455,
"balance" : 39556,
"firstname" : "Lynn",
"lastname" : "Tran",
"age" : 36,
"gender" : "M",
"address" : "741 Richmond Street",
"employer" : "Optyk",
"email" : "lynntran@optyk.com",
"city" : "Clinton",
"state" : "WV"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[jW8PbSdhTOOpESX13DRBJQ][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "PointRangeQuery",
"description" : "age:[36 TO 36]",
"time_in_nanos" : 101095,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 10111,
"match" : 0,
"next_doc_count" : 52,
"score_count" : 52,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 2175,
"advance_count" : 2,
"score" : 4530,
"build_scorer_count" : 4,
"create_weight" : 2346,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 81933
}
}
],
"rewrite_time" : 1248,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 76285
}
]
}
],
"aggregations" : [ ]
}
]
}
}
根据上面的结果,我们再一次验证了,term查询只针对精确值、布尔值、日期以及未经分析的字段,而如果我们用term去查询一些单词或者句子等,就会匹配不到对应的值,为什么呢?官网其实也给出了解释,在下图中就可以看到,其实是因为我们es默认的标准分词器可能将我们要查询的句子或者单词变成小写了,或者拆分了,而我们的term查询又是精确查询,所以不会分析我们的搜索字段的,从而导致term查询一些单词和短语时,很难得到满意的结果。当然这也是官方提醒我们要避免的,官方建议我们使用match来匹配查询一些单词和短语,效果要比term好。
传送门
如图所示:
这里我就不翻译了,如果想学习更多,可以自行阅读官方文档。
Terms查询
terms查询其实就是类似整合多个term查询。如例子所示:
GET bank/_search
{
"query": {
"terms": {
"age": [36,27]
}
},
"profile": "true"
}
返回结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 91,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "20",
"_score" : 1.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "elinorratliff@scentric.com",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "102",
"_score" : 1.0,
"_source" : {
"account_number" : 102,
"balance" : 29712,
"firstname" : "Dena",
"lastname" : "Olson",
"age" : 27,
"gender" : "F",
"address" : "759 Newkirk Avenue",
"employer" : "Hinway",
"email" : "denaolson@hinway.com",
"city" : "Choctaw",
"state" : "NJ"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "133",
"_score" : 1.0,
"_source" : {
"account_number" : 133,
"balance" : 26135,
"firstname" : "Deena",
"lastname" : "Richmond",
"age" : 36,
"gender" : "F",
"address" : "646 Underhill Avenue",
"employer" : "Sunclipse",
"email" : "deenarichmond@sunclipse.com",
"city" : "Austinburg",
"state" : "SC"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "222",
"_score" : 1.0,
"_source" : {
"account_number" : 222,
"balance" : 14764,
"firstname" : "Rachelle",
"lastname" : "Rice",
"age" : 36,
"gender" : "M",
"address" : "333 Narrows Avenue",
"employer" : "Enaut",
"email" : "rachellerice@enaut.com",
"city" : "Wright",
"state" : "AZ"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "239",
"_score" : 1.0,
"_source" : {
"account_number" : 239,
"balance" : 25719,
"firstname" : "Chang",
"lastname" : "Boyer",
"age" : 36,
"gender" : "M",
"address" : "895 Brigham Street",
"employer" : "Qaboos",
"email" : "changboyer@qaboos.com",
"city" : "Belgreen",
"state" : "NH"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "328",
"_score" : 1.0,
"_source" : {
"account_number" : 328,
"balance" : 12523,
"firstname" : "Good",
"lastname" : "Campbell",
"age" : 27,
"gender" : "F",
"address" : "438 Hicks Street",
"employer" : "Gracker",
"email" : "goodcampbell@gracker.com",
"city" : "Marion",
"state" : "CA"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "342",
"_score" : 1.0,
"_source" : {
"account_number" : 342,
"balance" : 33670,
"firstname" : "Vivian",
"lastname" : "Wells",
"age" : 36,
"gender" : "M",
"address" : "570 Cobek Court",
"employer" : "Nutralab",
"email" : "vivianwells@nutralab.com",
"city" : "Fontanelle",
"state" : "OK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "361",
"_score" : 1.0,
"_source" : {
"account_number" : 361,
"balance" : 23659,
"firstname" : "Noreen",
"lastname" : "Shelton",
"age" : 36,
"gender" : "M",
"address" : "702 Tillary Street",
"employer" : "Medmex",
"email" : "noreenshelton@medmex.com",
"city" : "Derwood",
"state" : "NH"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "378",
"_score" : 1.0,
"_source" : {
"account_number" : 378,
"balance" : 27100,
"firstname" : "Watson",
"lastname" : "Simpson",
"age" : 36,
"gender" : "F",
"address" : "644 Thomas Street",
"employer" : "Wrapture",
"email" : "watsonsimpson@wrapture.com",
"city" : "Keller",
"state" : "TX"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[jW8PbSdhTOOpESX13DRBJQ][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "PointInSetQuery",
"description" : "age:{27 36}",
"time_in_nanos" : 864841,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 3060,
"match" : 0,
"next_doc_count" : 91,
"score_count" : 91,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 2277,
"advance_count" : 2,
"score" : 2482,
"build_scorer_count" : 4,
"create_weight" : 52628,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 804394
}
}
],
"rewrite_time" : 1505,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 17392
}
]
}
],
"aggregations" : [ ]
}
]
}
}
当然在term中需要注意的在这里同样需要。terms也是同样是精确查询。这里提一嘴terms lookup的用法,先看lookup的例子:
GET bank/_search
{
"query": {
"terms": {
"age": {
"index" : "bank",
"id" : "342",
"path" : "age"
}
}
},
"profile": "true"
}
返回结果:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 52,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "20",
"_score" : 1.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "elinorratliff@scentric.com",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "133",
"_score" : 1.0,
"_source" : {
"account_number" : 133,
"balance" : 26135,
"firstname" : "Deena",
"lastname" : "Richmond",
"age" : 36,
"gender" : "F",
"address" : "646 Underhill Avenue",
"employer" : "Sunclipse",
"email" : "deenarichmond@sunclipse.com",
"city" : "Austinburg",
"state" : "SC"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "222",
"_score" : 1.0,
"_source" : {
"account_number" : 222,
"balance" : 14764,
"firstname" : "Rachelle",
"lastname" : "Rice",
"age" : 36,
"gender" : "M",
"address" : "333 Narrows Avenue",
"employer" : "Enaut",
"email" : "rachellerice@enaut.com",
"city" : "Wright",
"state" : "AZ"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "239",
"_score" : 1.0,
"_source" : {
"account_number" : 239,
"balance" : 25719,
"firstname" : "Chang",
"lastname" : "Boyer",
"age" : 36,
"gender" : "M",
"address" : "895 Brigham Street",
"employer" : "Qaboos",
"email" : "changboyer@qaboos.com",
"city" : "Belgreen",
"state" : "NH"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "342",
"_score" : 1.0,
"_source" : {
"account_number" : 342,
"balance" : 33670,
"firstname" : "Vivian",
"lastname" : "Wells",
"age" : 36,
"gender" : "M",
"address" : "570 Cobek Court",
"employer" : "Nutralab",
"email" : "vivianwells@nutralab.com",
"city" : "Fontanelle",
"state" : "OK"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "361",
"_score" : 1.0,
"_source" : {
"account_number" : 361,
"balance" : 23659,
"firstname" : "Noreen",
"lastname" : "Shelton",
"age" : 36,
"gender" : "M",
"address" : "702 Tillary Street",
"employer" : "Medmex",
"email" : "noreenshelton@medmex.com",
"city" : "Derwood",
"state" : "NH"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "378",
"_score" : 1.0,
"_source" : {
"account_number" : 378,
"balance" : 27100,
"firstname" : "Watson",
"lastname" : "Simpson",
"age" : 36,
"gender" : "F",
"address" : "644 Thomas Street",
"employer" : "Wrapture",
"email" : "watsonsimpson@wrapture.com",
"city" : "Keller",
"state" : "TX"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "397",
"_score" : 1.0,
"_source" : {
"account_number" : 397,
"balance" : 37418,
"firstname" : "Leonard",
"lastname" : "Gray",
"age" : 36,
"gender" : "F",
"address" : "840 Morgan Avenue",
"employer" : "Recritube",
"email" : "leonardgray@recritube.com",
"city" : "Edenburg",
"state" : "AL"
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "455",
"_score" : 1.0,
"_source" : {
"account_number" : 455,
"balance" : 39556,
"firstname" : "Lynn",
"lastname" : "Tran",
"age" : 36,
"gender" : "M",
"address" : "741 Richmond Street",
"employer" : "Optyk",
"email" : "lynntran@optyk.com",
"city" : "Clinton",
"state" : "WV"
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[jW8PbSdhTOOpESX13DRBJQ][bank][0]",
"searches" : [
{
"query" : [
{
"type" : "PointInSetQuery",
"description" : "age:{36}",
"time_in_nanos" : 137631,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 1715,
"match" : 0,
"next_doc_count" : 52,
"score_count" : 52,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 1762,
"advance_count" : 2,
"score" : 1462,
"build_scorer_count" : 4,
"create_weight" : 1747,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 130945
}
}
],
"rewrite_time" : 1454,
"collector" : [
{
"name" : "SimpleTopScoreDocCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 11493
}
]
}
],
"aggregations" : [ ]
}
]
}
}
我理解的意思就是查询与指定文档id下的指定的字段的值一样的文档有哪些。效率虽然高,但是限制也比较多,可自行查看官网描述的相关限制,传送门
暂时先学习到这,后续还会继续学习更多的查询。