弹性搜索 - 获取不同的标签
Elastic Search- Fetch Distinct Tags
我有以下格式的文档:
{
_id :"1",
tags:["guava","apple","mango", "banana", "gulmohar"]
}
{
_id:"2",
tags: ["orange","guava", "mango shakes", "apple pie", "grammar"]
}
{
_id:"3",
tags: ["apple","grapes", "water", "gulmohar","water-melon", "green"]
}
现在,我想从前缀 g*
开始的整个文档 'tags field' 中获取唯一标签值,以便标签建议器显示这些唯一标签(Whosebug 站点是一个示例)。
例如:每当用户键入“g
”时:
结果 "guava", "gulmohar", "grammar", "grapes" and "green"
应该 returned。
IE。查询应该 returns 具有前缀 g*.
的不同标签
我到处都试过了,浏览了整个文档,搜索了 es 论坛,但我没有找到任何线索,这让我很沮丧。
我尝试了聚合,但聚合 return 是标签字段中整个 words/token 的不同计数。它不是 return 以 'g'.
开头的唯一标签列表
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"allow_leading_wildcard": false,
"fields": [
"tags"
],
"query": "g*",
"fuzziness":0
}
}
]
}
},
"filter": {
//some condition on other field...
}
}
},
"aggs": {
"distinct_tags": {
"terms": {
"field": "tags",
"size": 10
}
}
},
以上结果:番石榴(w)、苹果(q)、芒果(1)、...
有人可以建议我获取前缀为 input_prefix* 的所有不同标签的正确方法吗?
有点hack,但这似乎可以实现你想要的。
我创建了一个索引并添加了您的文档:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"tags":["guava","apple","mango", "banana", "gulmohar"]}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"tags": ["orange","guava", "mango shakes", "apple pie", "grammar"]}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"tags": ["guava","apple","grapes", "water", "grammar","gulmohar","water-melon", "green"]}
然后我用了prefix query and highlighting的组合如下:
POST /test_index/_search
{
"query": {
"prefix": {
"tags": {
"value": "g"
}
}
},
"fields": [ ],
"highlight": {
"pre_tags": [""],
"post_tags": [""],
"fields": {
"tags": {}
}
}
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"highlight": {
"tags": [
"guava",
"gulmohar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grammar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grapes",
"grammar",
"gulmohar",
"green"
]
}
}
]
}
}
这是我使用的代码:
http://sense.qbox.io/gist/c14675ee8bd3934389a6cb0c85ff57621a17bf11
当然,您尝试执行的操作相当于自动完成,而且可能有比我在上面发布的方法更好的方法(尽管它们涉及更多)。以下是我们写的几篇关于设置自动完成方法的博文:
http://blog.qbox.io/quick-and-dirty-autocomplete-with-elasticsearch-completion-suggest
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
根据@Sloan Ahrens 的建议,我做了以下操作:
更新了映射:
"tags": {
"type": "completion",
"context": {
"filter_color": {
"type": "category",
"default": "",
"path": "fruits.color"
},
"filter_type": {
"type": "category",
"default": "",
"path": "fruits.type"
}
}
}
参考:ES API Guide
插入了这些索引:
{
_id :"1",
tags:{input" :["guava","apple","mango", "banana", "gulmohar"]},
fruits:{color:'bar',type:'alice'}
}
{
_id:"2",
tags:{["orange","guava", "mango shakes", "apple pie", "grammar"]}
fruits:{color:'foo',type:'bob'}
}
{
_id:"3",
tags:{ ["apple","grapes", "water", "gulmohar","water-melon", "green"]}
fruits:{color:'foo',type:'alice'}
}
我不需要修改太多,我的原始索引。刚刚在标签数组前添加了 input
。
POST rescu1/_suggest?pretty'
{
"suggest": {
"text": "g",
"completion": {
"field": "tags",
"size": 10,
"context": {
"filter_color": "bar",
"filter_type": "alice"
}
}
}
}
给了我想要的输出。
我接受了@Sloan Ahrens 的回答,因为他的建议对我很有吸引力,他为我指明了正确的方向。
我有以下格式的文档:
{
_id :"1",
tags:["guava","apple","mango", "banana", "gulmohar"]
}
{
_id:"2",
tags: ["orange","guava", "mango shakes", "apple pie", "grammar"]
}
{
_id:"3",
tags: ["apple","grapes", "water", "gulmohar","water-melon", "green"]
}
现在,我想从前缀 g*
开始的整个文档 'tags field' 中获取唯一标签值,以便标签建议器显示这些唯一标签(Whosebug 站点是一个示例)。
例如:每当用户键入“g
”时:
结果 "guava", "gulmohar", "grammar", "grapes" and "green"
应该 returned。
IE。查询应该 returns 具有前缀 g*.
我到处都试过了,浏览了整个文档,搜索了 es 论坛,但我没有找到任何线索,这让我很沮丧。
我尝试了聚合,但聚合 return 是标签字段中整个 words/token 的不同计数。它不是 return 以 'g'.
开头的唯一标签列表"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"allow_leading_wildcard": false,
"fields": [
"tags"
],
"query": "g*",
"fuzziness":0
}
}
]
}
},
"filter": {
//some condition on other field...
}
}
},
"aggs": {
"distinct_tags": {
"terms": {
"field": "tags",
"size": 10
}
}
},
以上结果:番石榴(w)、苹果(q)、芒果(1)、...
有人可以建议我获取前缀为 input_prefix* 的所有不同标签的正确方法吗?
有点hack,但这似乎可以实现你想要的。
我创建了一个索引并添加了您的文档:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"tags":["guava","apple","mango", "banana", "gulmohar"]}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"tags": ["orange","guava", "mango shakes", "apple pie", "grammar"]}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"tags": ["guava","apple","grapes", "water", "grammar","gulmohar","water-melon", "green"]}
然后我用了prefix query and highlighting的组合如下:
POST /test_index/_search
{
"query": {
"prefix": {
"tags": {
"value": "g"
}
}
},
"fields": [ ],
"highlight": {
"pre_tags": [""],
"post_tags": [""],
"fields": {
"tags": {}
}
}
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"highlight": {
"tags": [
"guava",
"gulmohar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grammar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grapes",
"grammar",
"gulmohar",
"green"
]
}
}
]
}
}
这是我使用的代码: http://sense.qbox.io/gist/c14675ee8bd3934389a6cb0c85ff57621a17bf11
当然,您尝试执行的操作相当于自动完成,而且可能有比我在上面发布的方法更好的方法(尽管它们涉及更多)。以下是我们写的几篇关于设置自动完成方法的博文:
http://blog.qbox.io/quick-and-dirty-autocomplete-with-elasticsearch-completion-suggest
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
根据@Sloan Ahrens 的建议,我做了以下操作:
更新了映射:
"tags": {
"type": "completion",
"context": {
"filter_color": {
"type": "category",
"default": "",
"path": "fruits.color"
},
"filter_type": {
"type": "category",
"default": "",
"path": "fruits.type"
}
}
}
参考:ES API Guide
插入了这些索引:
{
_id :"1",
tags:{input" :["guava","apple","mango", "banana", "gulmohar"]},
fruits:{color:'bar',type:'alice'}
}
{
_id:"2",
tags:{["orange","guava", "mango shakes", "apple pie", "grammar"]}
fruits:{color:'foo',type:'bob'}
}
{
_id:"3",
tags:{ ["apple","grapes", "water", "gulmohar","water-melon", "green"]}
fruits:{color:'foo',type:'alice'}
}
我不需要修改太多,我的原始索引。刚刚在标签数组前添加了 input
。
POST rescu1/_suggest?pretty'
{
"suggest": {
"text": "g",
"completion": {
"field": "tags",
"size": 10,
"context": {
"filter_color": "bar",
"filter_type": "alice"
}
}
}
}
给了我想要的输出。
我接受了@Sloan Ahrens 的回答,因为他的建议对我很有吸引力,他为我指明了正确的方向。