ElasticSearch:在具有复杂对象的数组中查找多个唯一值
ElasticSearch: find multiple unique values in array with complex objects
假设有一个索引,其中的文档遵循如下结构:
{
"array": [
{
"field1": 1,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
...
]
}
是否可以定义一个查询 returns 文档的一个字段具有多个唯一值?
对于上面的示例,在 field2 上搜索的查询不会 return 文档,因为它们都具有相同的值, 但搜索 field1 会 return 因为它有值 1 和 3.
我唯一能想到的是将唯一值存储在父对象中,然后查询它的长度,但是,因为这看起来微不足道,我希望不必将结构更改为类似于:
{
"arrayField1Values" : [1, 3],
"arrayField2Values" : [2]
"array": [
{
"field1": 1,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
...
]
}
感谢任何可以提供帮助的人!
我的直觉是使用 nested
数据类型,但后来我意识到您可以使用 query scripts
and top_hits
:[=18= 对字段 1 和 2 的数组值进行简单的不同计数]
PUT array
POST array/_doc
{
"array": [
{
"field1": 1,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
{
"field1": 3,
"field2": 2
}
]
}
GET array/_search
{
"size": 0,
"aggs": {
"field1_is_unique": {
"filter": {
"script": {
"script": {
"source": "def uniques = doc['array.field1'].stream().distinct().sorted().collect(Collectors.toList()); return uniques.length > 1 ;",
"lang": "painless"
}
}
},
"aggs": {
"top_hits_field1": {
"top_hits": {}
}
}
},
"field2_is_unique": {
"filter": {
"script": {
"script": {
"source": "def uniques = doc['array.field2'].stream().distinct().sorted().collect(Collectors.toList()); return uniques.length > 1 ;",
"lang": "painless"
}
}
},
"aggs": {
"top_hits_field2": {
"top_hits": {}
}
}
}
}
}
针对 field1
或 field2
是否包含大于 1 的唯一值计数生成单独的聚合:
"aggregations" : {
"field1_is_unique" : {
"doc_count" : 1,
"top_hits_field1" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "array",
"_type" : "_doc",
"_id" : "WbJhgnEBVBaNYdXKNktL",
"_score" : 1.0,
"_source" : {
"array" : [
{
"field1" : 1,
"field2" : 2
},
{
"field1" : 3,
"field2" : 2
},
{
"field1" : 3,
"field2" : 2
}
]
}
}
]
}
}
},
"field2_is_unique" : {
"doc_count" : 0,
"top_hits_field2" : {
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
}
}
希望对您有所帮助。
假设有一个索引,其中的文档遵循如下结构:
{
"array": [
{
"field1": 1,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
...
]
}
是否可以定义一个查询 returns 文档的一个字段具有多个唯一值?
对于上面的示例,在 field2 上搜索的查询不会 return 文档,因为它们都具有相同的值, 但搜索 field1 会 return 因为它有值 1 和 3.
我唯一能想到的是将唯一值存储在父对象中,然后查询它的长度,但是,因为这看起来微不足道,我希望不必将结构更改为类似于:
{
"arrayField1Values" : [1, 3],
"arrayField2Values" : [2]
"array": [
{
"field1": 1,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
...
]
}
感谢任何可以提供帮助的人!
我的直觉是使用 nested
数据类型,但后来我意识到您可以使用 query scripts
and top_hits
:[=18= 对字段 1 和 2 的数组值进行简单的不同计数]
PUT array
POST array/_doc
{
"array": [
{
"field1": 1,
"field2": 2
},
{
"field1": 3,
"field2": 2
},
{
"field1": 3,
"field2": 2
}
]
}
GET array/_search
{
"size": 0,
"aggs": {
"field1_is_unique": {
"filter": {
"script": {
"script": {
"source": "def uniques = doc['array.field1'].stream().distinct().sorted().collect(Collectors.toList()); return uniques.length > 1 ;",
"lang": "painless"
}
}
},
"aggs": {
"top_hits_field1": {
"top_hits": {}
}
}
},
"field2_is_unique": {
"filter": {
"script": {
"script": {
"source": "def uniques = doc['array.field2'].stream().distinct().sorted().collect(Collectors.toList()); return uniques.length > 1 ;",
"lang": "painless"
}
}
},
"aggs": {
"top_hits_field2": {
"top_hits": {}
}
}
}
}
}
针对 field1
或 field2
是否包含大于 1 的唯一值计数生成单独的聚合:
"aggregations" : {
"field1_is_unique" : {
"doc_count" : 1,
"top_hits_field1" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "array",
"_type" : "_doc",
"_id" : "WbJhgnEBVBaNYdXKNktL",
"_score" : 1.0,
"_source" : {
"array" : [
{
"field1" : 1,
"field2" : 2
},
{
"field1" : 3,
"field2" : 2
},
{
"field1" : 3,
"field2" : 2
}
]
}
}
]
}
}
},
"field2_is_unique" : {
"doc_count" : 0,
"top_hits_field2" : {
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
}
}
希望对您有所帮助。