如何使用 elasticsearch-dsl 在所有索引中查找数组中的不同值?
How to find the distinct values in the array in all the indexes using elasticsearch-dsl?
我在 Django 中使用 elasticsearch-dsl。我定义了一个 DocType 文档和一个包含值列表的关键字。
这是我的代码。
from elasticsearch_dsl import DocType, Text, Keyword
class ProductIndex(DocType):
"""
Index for products
"""
id = Keyword()
slug = Keyword()
name = Text()
filter_list = Keyword()
filter_list 是这里包含多个值的数组。现在我有一些值,比如 sample_filter_list,它们是不同的值,其中一些元素可以出现在某些产品的 filter_list 数组中。所以给定这个 sample_filter_list,我想要 filter_list 与 sample_filter_list 相交的所有产品的 filter_list 的所有唯一元素不为空。
for example I have 5 products whose filter_list is like :
1) ['a', 'b', 'c']
2) ['d', 'e', 'f']
3) ['g', 'h', 'i']
4) ['j', 'k', 'l']
5) ['m', 'n', 'o']
and if my sample filter_list is ['a', 'd', 'g', 'j', 'm']
then elasticsearch should return an array containg distinct element
i.e. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']
我对您想要什么感到有点困惑,只查询 filter_list
与 sample_filter_list
相交的产品只是 运行 一个 terms
查询:
ProductIndex.search().filter('terms', filter_list=sample_filter_list)
希望对您有所帮助!
Writing Answer not specific to django but general,
Suppose you have some ES index some_index2 with mapping
PUT some_index2
{
"mappings": {
"some_type": {
"dynamic_templates": [
{
"strings": {
"mapping": {
"type": "string"
},
"match_mapping_type": "string"
}
}
],
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}
}
Also you have inserted the documents
{
"field1":"id1",
"field2":["a","b","c","d]
}
{
"field1":"id2",
"field2":["e","f","g"]
}
{
"field1":"id3",
"field2":["e","l","k"]
}
Now as you stated you want all the distinct values of field2(filter_list) in your case, You can easily get that by using ElasticSearch term aggregation
GET some_index2/_search
{
"aggs": {
"some_name": {
"terms": {
"field": "field2",
"size": 10000
}
}
},
"size": 0
}
Which will give you result as:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"some_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "e",
"doc_count": 2
},
{
"key": "a",
"doc_count": 1
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
},
{
"key": "d",
"doc_count": 1
},
{
"key": "f",
"doc_count": 1
},
{
"key": "g",
"doc_count": 1
},
{
"key": "k",
"doc_count": 1
},
{
"key": "l",
"doc_count": 1
}
]
}
}
}
where buckets contains the list of all the distinct values.
you can easily iterate through bucket and find the value under KEY.
Hope this is what is required to you.
我在 Django 中使用 elasticsearch-dsl。我定义了一个 DocType 文档和一个包含值列表的关键字。
这是我的代码。
from elasticsearch_dsl import DocType, Text, Keyword
class ProductIndex(DocType):
"""
Index for products
"""
id = Keyword()
slug = Keyword()
name = Text()
filter_list = Keyword()
filter_list 是这里包含多个值的数组。现在我有一些值,比如 sample_filter_list,它们是不同的值,其中一些元素可以出现在某些产品的 filter_list 数组中。所以给定这个 sample_filter_list,我想要 filter_list 与 sample_filter_list 相交的所有产品的 filter_list 的所有唯一元素不为空。
for example I have 5 products whose filter_list is like :
1) ['a', 'b', 'c']
2) ['d', 'e', 'f']
3) ['g', 'h', 'i']
4) ['j', 'k', 'l']
5) ['m', 'n', 'o']
and if my sample filter_list is ['a', 'd', 'g', 'j', 'm']
then elasticsearch should return an array containg distinct element
i.e. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']
我对您想要什么感到有点困惑,只查询 filter_list
与 sample_filter_list
相交的产品只是 运行 一个 terms
查询:
ProductIndex.search().filter('terms', filter_list=sample_filter_list)
希望对您有所帮助!
Writing Answer not specific to django but general,
Suppose you have some ES index some_index2 with mapping
PUT some_index2
{
"mappings": {
"some_type": {
"dynamic_templates": [
{
"strings": {
"mapping": {
"type": "string"
},
"match_mapping_type": "string"
}
}
],
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
}
}
}
}
}
Also you have inserted the documents
{
"field1":"id1",
"field2":["a","b","c","d]
}
{
"field1":"id2",
"field2":["e","f","g"]
}
{
"field1":"id3",
"field2":["e","l","k"]
}
Now as you stated you want all the distinct values of field2(filter_list) in your case, You can easily get that by using ElasticSearch term aggregation
GET some_index2/_search
{
"aggs": {
"some_name": {
"terms": {
"field": "field2",
"size": 10000
}
}
},
"size": 0
}
Which will give you result as:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"some_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "e",
"doc_count": 2
},
{
"key": "a",
"doc_count": 1
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
},
{
"key": "d",
"doc_count": 1
},
{
"key": "f",
"doc_count": 1
},
{
"key": "g",
"doc_count": 1
},
{
"key": "k",
"doc_count": 1
},
{
"key": "l",
"doc_count": 1
}
]
}
}
}
where buckets contains the list of all the distinct values.
you can easily iterate through bucket and find the value under KEY.
Hope this is what is required to you.