如何在 elasticsearch 中按深层对象内的元素排序?
How to sort by an element inside a deep object in elasticsearch?
我有这个映射:
{
"foos": {
"mappings": {
"foo": {
"dynamic": "false",
"properties": {
"some_id": {
"type": "integer"
},
"language": {
"type": "text"
},
"locations": {
"type": "integer"
},
"name": {
"type": "text",
"term_vector": "yes",
"analyzer": "name_analyzer"
},
"popularity": {
"type": "integer"
},
"some_deep_count": {
"type": "object"
}
}
}
}
}
}
一个示例条目如下:
{
"name": "Some nice name",
"some_id": 1,
"id": 4378,
"popularity": 525,
"some_deep_count": {
"0": {
"32026": 344,
"55625": 458,
"29": 1077,
"55531": 1081,
...
},
"1": {
"32026": 57,
"55625": 60,
"29": 88,
...
}
},
"locations": [
32026,
55625,
...
],
"language": [
"es",
"en"
]
}
其中 some_deep_count
字段只能包含“0”和“1”键,其中可以包含很长的 id => value 列表(动态,不可提前配置)
这在过滤时非常有效:
"_source": [
"id",
"some_deep_count.*.55529"
],
但我不明白如何按任何深层对象进行排序。我需要一个深度求和,如下表达式所示:
...
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "def deep0 = 0; def deep1 = 0; if(doc.containsKey('some_deep_count.0.55529')) { deep0 = doc['some_deep_count.0.55529'] } if(doc.containsKey('some_deep_count.1.55529')) { deep1 = doc['some_deep_count.1.55529'] } return deep0 + deep1"
},
"order": "desc"
}
}
}
不幸的是,在排序字段中总是 returns 0
,因为 doc.containsKey('some_deep_count.0.55529')
结果总是空的。 doc.containsKey('some_deep_count')
也是。
有趣的是,doc.containsKey('some_id')
会起作用,我真的不明白为什么
编辑
为了响应 Val 的建议,我附上了完整的请求/响应
要求:
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "def ps0 = 0; if(doc.containsKey('some_deep_count.0.55529')) { ps0 = doc['some_deep_count.0.55529'].value; } return ps0 "
},
"order": "desc"
}
},
"_source": [
"id",
"some_deep_count.0.55529"
],
"size": 1
}
回复:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2121,
"max_score": null,
"hits": [
{
"_index": "foos",
"_type": "foo",
"_id": "5890",
"_score": null,
"_source": {
"some_deep_count": {
"0": {
"55529": 228
}
},
"id": 5890
},
"sort": [
0.0
]
}
]
}
}
问题可能是在条件中发现的。事实上,即使排序像 "def ps0 = 0; if(doc.containsKey('some_deep_count')) { ps0 = 99999; } return ps0 "
这样简单,我也会得到 "sort":[0.0]
,这表明子句 doc.containsKey('some_deep_count')
可能有一些问题
编辑2
用 curl -XGET localhost:9200/foos
得到的索引如下所示:
{
"foos": {
"aliases": {},
"mappings": {
"foo": {
"dynamic": "false",
"properties": {
"some_id": {
"type": "integer"
},
"language": {
"type": "text"
},
"locations": {
"type": "integer"
},
"name": {
"type": "text",
"term_vector": "yes",
"analyzer": "name_analyzer"
},
"popularity": {
"type": "integer"
},
"some_deep_count": {
"type": "object"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "foos",
"creation_date": "1576168104248",
"analysis": {
"analyzer": {
"name_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "0",
"uuid": "26xckWaOQuuxFrMvIdikvw",
"version": {
"created": "6020199"
}
}
}
}
}
谢谢
我已经能够 return 通过重现您的案例来获得非零排序值,如下所示:
# 1. create the index mapping
PUT sorts
{
"mappings": {
"properties": {
"some_deep_count": {
"type": "object"
}
}
}
}
# 2. index a sample document
PUT sorts/_doc/1
{
"some_deep_count": {
"0": {
"29": 1077,
"32026": 344,
"55531": 1081,
"55625": 458
},
"1": {
"29": 88,
"32026": 57,
"55625": 60
}
}
}
# 3. Sort the results
POST sorts/_search
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
def deep0 = 0;
def deep1 = 0;
if(doc.containsKey('some_deep_count.0.55531')) {
deep0 = doc['some_deep_count.0.55531'].value;
}
if(doc.containsKey('some_deep_count.1.55531')) {
deep1 = doc['some_deep_count.1.55531'].value;
}
return deep0 + deep1;
"""
},
"order": "desc"
}
}
}
结果 => 排序 = 1081
"hits" : [
{
"_index" : "sorts",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"some_deep_count" : {
"0" : {
"29" : 1077,
"32026" : 344,
"55531" : 1081,
"55625" : 458
},
"1" : {
"29" : 88,
"32026" : 57,
"55625" : 60
}
}
},
"sort" : [
1081.0
]
}
]
如您所见,我使用的 55531
存在于 some_deep_count.0
中,但不存在于 some_deep_count.1
中,结果是 1081,这是正确的。
使用在 0 和 1 中都可用的 *.29
会产生 1165
,这也是正确的 (1077 + 88)。
我的脚本和你的唯一区别是,在分配 deep0
和 deep1
时,你需要将 .value
添加到文档字段引用中。
更新
问题在于您在映射中指定了 dynamic: false
。使用该参数意味着如果您索引创建索引时映射中不存在的新字段,您的映射将不会更新。因此,就目前而言,您在 some_deep_count
中索引的所有子字段都不会被索引,这就是为什么您总是得到 0 的原因。删除 dynamic: false
一切都会按预期工作。
我有这个映射:
{
"foos": {
"mappings": {
"foo": {
"dynamic": "false",
"properties": {
"some_id": {
"type": "integer"
},
"language": {
"type": "text"
},
"locations": {
"type": "integer"
},
"name": {
"type": "text",
"term_vector": "yes",
"analyzer": "name_analyzer"
},
"popularity": {
"type": "integer"
},
"some_deep_count": {
"type": "object"
}
}
}
}
}
}
一个示例条目如下:
{
"name": "Some nice name",
"some_id": 1,
"id": 4378,
"popularity": 525,
"some_deep_count": {
"0": {
"32026": 344,
"55625": 458,
"29": 1077,
"55531": 1081,
...
},
"1": {
"32026": 57,
"55625": 60,
"29": 88,
...
}
},
"locations": [
32026,
55625,
...
],
"language": [
"es",
"en"
]
}
其中 some_deep_count
字段只能包含“0”和“1”键,其中可以包含很长的 id => value 列表(动态,不可提前配置)
这在过滤时非常有效:
"_source": [
"id",
"some_deep_count.*.55529"
],
但我不明白如何按任何深层对象进行排序。我需要一个深度求和,如下表达式所示:
...
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "def deep0 = 0; def deep1 = 0; if(doc.containsKey('some_deep_count.0.55529')) { deep0 = doc['some_deep_count.0.55529'] } if(doc.containsKey('some_deep_count.1.55529')) { deep1 = doc['some_deep_count.1.55529'] } return deep0 + deep1"
},
"order": "desc"
}
}
}
不幸的是,在排序字段中总是 returns 0
,因为 doc.containsKey('some_deep_count.0.55529')
结果总是空的。 doc.containsKey('some_deep_count')
也是。
有趣的是,doc.containsKey('some_id')
会起作用,我真的不明白为什么
编辑
为了响应 Val 的建议,我附上了完整的请求/响应
要求:
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "def ps0 = 0; if(doc.containsKey('some_deep_count.0.55529')) { ps0 = doc['some_deep_count.0.55529'].value; } return ps0 "
},
"order": "desc"
}
},
"_source": [
"id",
"some_deep_count.0.55529"
],
"size": 1
}
回复:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2121,
"max_score": null,
"hits": [
{
"_index": "foos",
"_type": "foo",
"_id": "5890",
"_score": null,
"_source": {
"some_deep_count": {
"0": {
"55529": 228
}
},
"id": 5890
},
"sort": [
0.0
]
}
]
}
}
问题可能是在条件中发现的。事实上,即使排序像 "def ps0 = 0; if(doc.containsKey('some_deep_count')) { ps0 = 99999; } return ps0 "
这样简单,我也会得到 "sort":[0.0]
,这表明子句 doc.containsKey('some_deep_count')
可能有一些问题
编辑2
用 curl -XGET localhost:9200/foos
得到的索引如下所示:
{
"foos": {
"aliases": {},
"mappings": {
"foo": {
"dynamic": "false",
"properties": {
"some_id": {
"type": "integer"
},
"language": {
"type": "text"
},
"locations": {
"type": "integer"
},
"name": {
"type": "text",
"term_vector": "yes",
"analyzer": "name_analyzer"
},
"popularity": {
"type": "integer"
},
"some_deep_count": {
"type": "object"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "foos",
"creation_date": "1576168104248",
"analysis": {
"analyzer": {
"name_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "0",
"uuid": "26xckWaOQuuxFrMvIdikvw",
"version": {
"created": "6020199"
}
}
}
}
}
谢谢
我已经能够 return 通过重现您的案例来获得非零排序值,如下所示:
# 1. create the index mapping
PUT sorts
{
"mappings": {
"properties": {
"some_deep_count": {
"type": "object"
}
}
}
}
# 2. index a sample document
PUT sorts/_doc/1
{
"some_deep_count": {
"0": {
"29": 1077,
"32026": 344,
"55531": 1081,
"55625": 458
},
"1": {
"29": 88,
"32026": 57,
"55625": 60
}
}
}
# 3. Sort the results
POST sorts/_search
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
def deep0 = 0;
def deep1 = 0;
if(doc.containsKey('some_deep_count.0.55531')) {
deep0 = doc['some_deep_count.0.55531'].value;
}
if(doc.containsKey('some_deep_count.1.55531')) {
deep1 = doc['some_deep_count.1.55531'].value;
}
return deep0 + deep1;
"""
},
"order": "desc"
}
}
}
结果 => 排序 = 1081
"hits" : [
{
"_index" : "sorts",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"some_deep_count" : {
"0" : {
"29" : 1077,
"32026" : 344,
"55531" : 1081,
"55625" : 458
},
"1" : {
"29" : 88,
"32026" : 57,
"55625" : 60
}
}
},
"sort" : [
1081.0
]
}
]
如您所见,我使用的 55531
存在于 some_deep_count.0
中,但不存在于 some_deep_count.1
中,结果是 1081,这是正确的。
使用在 0 和 1 中都可用的 *.29
会产生 1165
,这也是正确的 (1077 + 88)。
我的脚本和你的唯一区别是,在分配 deep0
和 deep1
时,你需要将 .value
添加到文档字段引用中。
更新
问题在于您在映射中指定了 dynamic: false
。使用该参数意味着如果您索引创建索引时映射中不存在的新字段,您的映射将不会更新。因此,就目前而言,您在 some_deep_count
中索引的所有子字段都不会被索引,这就是为什么您总是得到 0 的原因。删除 dynamic: false
一切都会按预期工作。