Elasticsearch 部分匹配
时间:2017-12-16 20:59:55
参考:
环境:
- 系统版本:Win10 64位
- Elasticsearch版本:6.0.0
部分匹配#
- 前缀匹配
- 实时搜索,根据输入信息实时搜索
- 长组合词
构造数据#
-
指定字段数据类型以及分析器
PUT /my_index/ { "settings": { "number_of_shards": 1, "analysis": { "filter": { "autocomplete_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 20 } }, "analyzer": { "autocomplete": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "autocomplete_filter" ] } } } }, "mappings": { "address": { "properties": { "postcode": { "type": "keyword" }, "name": { "type": "text", "analyzer": "autocomplete" } } } } }
注:
autocomplete
分析器会把单词按如下形式分割quick
会分解成q
qu
qui
quic
quick
。 -
插入数据
POST /my_index/address/_bulk { "index": { "_id": 1 }} { "postcode": "W1V 3DG","title": "Sue ate the alligator", "name": "Brown foxes"} { "index": { "_id": 2 }} { "postcode": "W2F 8HW" ,"title": "The alligator ate Sue", "name":"Kobe Briant" } { "index": { "_id": 3 }} { "postcode": "WC1N 1LZ","title" : "Sue never goes anywhere without her alligator skin purse", "name": "Yellow furballs"} { "index": { "_id": 3 }} { "postcode": "SW5 0BE","title":"The alligator eat Sue", "name":"Sun Feilong" }
1
数据含义:这些数据是美国邮编,以 `W1V 3DG` 为例:
W1V: 这是邮编的外部,它定义了邮件的区域和行政区:
W 代表区域( 1 或 2 个字母)
1V 代表行政区( 1 或 2 个数字,可能跟着一个字符)3DG: 内部定义了街道或建筑:
3 代表街区区块( 1 个数字)
DG 代表单元( 2 个字母)
查询示例#
-
前缀查询
prefix
。 根据前缀进行匹配。 -
模糊查询
wildcard
。下面查询匹配以W开头,后面一个任意字符,后面是F,再后面是任意个任意字符后面跟着 HW 的数据。 -
正则查询
regexp
。下面查询匹配以W开头,后面跟一个数字,后面至少有一个任意字符的数据,W1
不会匹配。 -
短语前缀匹配
match_phrase_prefix
。 下面查询会匹配 title 包含the alligator at*
的内容。 -
短语前缀匹配
match_phrase_prefix
指定slop
,。只有最后一个词才能当前缀使用。GET /my_index/address/_search { "query": { "match_phrase_prefix": { "title": { "query": "The alligator at", "slop" :10 } } } }
查询结果:
"hits": [ { "_index": "my_index", "_type": "address", "_id": "2", "_score": 0.8630463, "_source": { "postcode": "W2F 8HW", "title": "The alligator ate Sue" } }, { "_index": "my_index", "_type": "address", "_id": "1", "_score": 0.3273624, "_source": { "postcode": "W1V 3DG", "title": "Sue ate the alligator" } } ]
-
短语前缀匹配
match_phrase_prefix
指定slop
,同时限制前缀查询查询数据量。 -
n-grams
边界查询。对单词进行特殊的切分,如quick
会分解成q
qu
qui
quic
quick
。查询结果:
"hits": [ { "_index": "my_index", "_type": "address", "_id": "1", "_score": 2.0325346, "_source": { "postcode": "W1V 3DG", "title": "Sue ate the alligator", "name": "Brown foxes" } }, { "_index": "my_index", "_type": "address", "_id": "2", "_score": 1.2379456, "_source": { "postcode": "W2F 8HW", "title": "The alligator ate Sue", "name": "Kobe Briant" } }, { "_index": "my_index", "_type": "address", "_id": "3", "_score": 0.5361201, "_source": { "postcode": "SW5 0BE", "title": "The alligator eat Sue", "name": "Sun Feilong" } } ]
匹配结果和预期的不一样,是因为我们查询参数也是用了
autocomplete
分析器,使用下面的语句进行验证:json GET /my_index/address/_validate/query?explain { "query": { "match": { "name": "brown fo" } } }
验证结果:
"explanations": [ { "index": "my_index", "valid": true, "explanation": "+(Synonym(name:b name:br name:bro name:brow name:brown) Synonym(name:f name:fo)) #*:*" } ]
为了查找到想要的结果,我们可以在查询时指定其他分析器或者再定义索引的时候,配置相应的查询解析器
方案一:查询时,指定查询参数的分析器
GET /my_index/address/_search { "query": { "match": { "name": { "query": "brown fo", "analyzer": "standard" } } } }
方案二(推荐):创建索引时指定对应的查询分析器。
-
解决长组合词问题,长组合词时多个单词包含连接起来组成的一个单词,例如
MyNameIsSunFeiLong
在分词的时候,会被分割成一个完整的单词,如果查询name
或其包含的任意一个单词,并不会与之匹配,我们可以通过分词器把长词组合分割成一段段的单词,如myn
yna
nam
ame
... ,我们查询的词也被分割成nam
ame
,这样就能和结果匹配,为了达到这种效果,我们需要自定义的分析器。定义分析器:
PUT /my_index { "settings": { "analysis": { "filter": { "trigrams_filter": { "type": "ngram", "min_gram": 3, "max_gram": 3 } }, "analyzer": { "trigrams": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "trigrams_filter" ] } } } }, "mappings": { "my_type": { "properties": { "text": { "type": "text", "analyzer": "trigrams" } } } } }
插入数据
POST /my_index/my_type/_bulk { "index": { "_id": 1 }} { "text": "Aussprachewörterbuch" } { "index": { "_id": 2 }} { "text": "Militärgeschichte" } { "index": { "_id": 3 }} { "text": "Weißkopfseeadler" } { "index": { "_id": 4 }} { "text": "Weltgesundheitsorganisation" } { "index": { "_id": 5 }} { "text": "Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz" }
测试查询:
查询结果: