Elasticsearch根据满足过滤器的数组中的元素进行排序
Elasticsearch sort based on element in array that satisfies filter
我的类型有一个字段,它是 ISO 8601 格式的时间数组。我想获取所有在某一天有时间的列表,然后按它们在该特定日期出现的最早时间排序。问题是我的查询是根据 所有 天的最早时间排序的。
您可以重现下面的问题。
curl -XPUT 'localhost:9200/listings?pretty'
curl -XPOST 'localhost:9200/listings/listing/_bulk?pretty' -d '
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "times": ["2018-12-05T12:00:00","2018-12-06T11:00:00"] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "times": ["2018-12-05T10:00:00","2018-12-06T12:00:00"] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "times": ["2018-12-05T11:00:00","2018-12-06T10:00:00"] }
'
# because ES takes time to add them to index
sleep 2
echo "Query listings on the 6th!"
curl -XPOST 'localhost:9200/listings/_search?pretty' -d '
{
"sort": {
"times": {
"order": "asc",
"nested_filter": {
"range": {
"times": {
"gte": "2018-12-06T00:00:00",
"lte": "2018-12-06T23:59:59"
}
}
}
}
},
"query": {
"bool": {
"filter": {
"range": {
"times": {
"gte": "2018-12-06T00:00:00",
"lte": "2018-12-06T23:59:59"
}
}
}
}
}
}'
curl -XDELETE 'localhost:9200/listings?pretty'
将上述脚本添加到 .sh 文件并 运行 它有助于重现问题。您会看到订单是根据 5 日而不是 6 日发生的。 Elasticsearch将时间转换为一个epoch_millis
数字进行排序,可以在hits对象的sort字段中看到epoch编号,例如1544007600000。asc排序时,in取数组中最小的数字(顺序不重要) ) 并以此为基础进行排序。
不知怎么的,我需要在查询日期的最早时间订购它,即 6 号。
目前正在使用 Elasticsearch 2.4,但即使有人可以向我展示它在当前版本中是如何完成的,那也很棒。
如果有帮助,这是他们关于 nested queries and scripting 的文档。
我认为这里的问题是嵌套排序适用于嵌套对象,而不适用于数组。
如果您将文档转换为使用嵌套对象数组而不是简单日期数组的文档,那么您可以构建一个有效的嵌套筛选排序。
以下是 Elasticsearch 6.0 - 从 6.1 开始,他们稍微更改了语法,我不确定其中有多少适用于 2.x:
映射:
PUT nested-listings
{
"mappings": {
"listing": {
"properties": {
"name": {
"type": "keyword"
},
"openTimes": {
"type": "nested",
"properties": {
"date": {
"type": "date"
}
}
}
}
}
}
}
数据:
POST nested-listings/listing/_bulk
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "openTimes": [ { "date": "2018-12-05T12:00:00" }, { "date": "2018-12-06T11:00:00" }] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "openTimes": [ {"date": "2018-12-05T10:00:00"}, { "date": "2018-12-06T12:00:00" }] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "openTimes": [ {"date": "2018-12-05T11:00:00" }, { "date": "2018-12-06T10:00:00" }] }
因此,我们有一个 "openTimes" 嵌套对象,而不是 "nextNexpectionOpenTimes",每个列表都包含一个 openTimes 数组。
现在搜索:
POST nested-listings/_search
{
"sort": {
"openTimes.date": {
"order": "asc",
"nested_path": "openTimes",
"nested_filter": {
"range": {
"openTimes.date": {
"gte": "2018-12-06T00:00:00",
"lte": "2018-12-06T23:59:59"
}
}
}
}
},
"query": {
"nested": {
"path": "openTimes",
"query": {
"bool": {
"filter": {
"range": {
"openTimes.date": {
"gte": "2018-12-06T00:00:00",
"lte": "2018-12-06T23:59:59"
}
}
}
}
}
}
}
}
这里的主要区别是查询略有不同,因为您需要使用 "nested" 查询来过滤嵌套对象。
这给出了以下结果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "nested-listings",
"_type": "listing",
"_id": "vHH6e2cB28sphqox2Dcm",
"_score": null,
"_source": {
"name": "first on the 6th (2nd on the 5th)"
},
"sort": [
1544090400000
]
},
{
"_index": "nested-listings",
"_type": "listing",
"_id": "unH6e2cB28sphqox2Dcm",
"_score": null,
"_source": {
"name": "second on 6th (3rd on the 5th)"
},
"sort": [
1544094000000
]
},
{
"_index": "nested-listings",
"_type": "listing",
"_id": "u3H6e2cB28sphqox2Dcm",
"_score": null,
"_source": {
"name": "third on 6th (1st on the 5th)"
},
"sort": [
1544097600000
]
}
]
}
}
我认为您实际上无法 select ES 中数组中的单个值,因此对于排序,您总是要对所有结果进行排序。您可以对普通数组做的最好的事情是选择如何处理该数组以进行排序(使用最低、最高、平均值等)。
我的类型有一个字段,它是 ISO 8601 格式的时间数组。我想获取所有在某一天有时间的列表,然后按它们在该特定日期出现的最早时间排序。问题是我的查询是根据 所有 天的最早时间排序的。
您可以重现下面的问题。
curl -XPUT 'localhost:9200/listings?pretty'
curl -XPOST 'localhost:9200/listings/listing/_bulk?pretty' -d '
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "times": ["2018-12-05T12:00:00","2018-12-06T11:00:00"] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "times": ["2018-12-05T10:00:00","2018-12-06T12:00:00"] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "times": ["2018-12-05T11:00:00","2018-12-06T10:00:00"] }
'
# because ES takes time to add them to index
sleep 2
echo "Query listings on the 6th!"
curl -XPOST 'localhost:9200/listings/_search?pretty' -d '
{
"sort": {
"times": {
"order": "asc",
"nested_filter": {
"range": {
"times": {
"gte": "2018-12-06T00:00:00",
"lte": "2018-12-06T23:59:59"
}
}
}
}
},
"query": {
"bool": {
"filter": {
"range": {
"times": {
"gte": "2018-12-06T00:00:00",
"lte": "2018-12-06T23:59:59"
}
}
}
}
}
}'
curl -XDELETE 'localhost:9200/listings?pretty'
将上述脚本添加到 .sh 文件并 运行 它有助于重现问题。您会看到订单是根据 5 日而不是 6 日发生的。 Elasticsearch将时间转换为一个epoch_millis
数字进行排序,可以在hits对象的sort字段中看到epoch编号,例如1544007600000。asc排序时,in取数组中最小的数字(顺序不重要) ) 并以此为基础进行排序。
不知怎么的,我需要在查询日期的最早时间订购它,即 6 号。
目前正在使用 Elasticsearch 2.4,但即使有人可以向我展示它在当前版本中是如何完成的,那也很棒。
如果有帮助,这是他们关于 nested queries and scripting 的文档。
我认为这里的问题是嵌套排序适用于嵌套对象,而不适用于数组。
如果您将文档转换为使用嵌套对象数组而不是简单日期数组的文档,那么您可以构建一个有效的嵌套筛选排序。
以下是 Elasticsearch 6.0 - 从 6.1 开始,他们稍微更改了语法,我不确定其中有多少适用于 2.x:
映射:
PUT nested-listings
{
"mappings": {
"listing": {
"properties": {
"name": {
"type": "keyword"
},
"openTimes": {
"type": "nested",
"properties": {
"date": {
"type": "date"
}
}
}
}
}
}
}
数据:
POST nested-listings/listing/_bulk
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "openTimes": [ { "date": "2018-12-05T12:00:00" }, { "date": "2018-12-06T11:00:00" }] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "openTimes": [ {"date": "2018-12-05T10:00:00"}, { "date": "2018-12-06T12:00:00" }] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "openTimes": [ {"date": "2018-12-05T11:00:00" }, { "date": "2018-12-06T10:00:00" }] }
因此,我们有一个 "openTimes" 嵌套对象,而不是 "nextNexpectionOpenTimes",每个列表都包含一个 openTimes 数组。
现在搜索:
POST nested-listings/_search
{
"sort": {
"openTimes.date": {
"order": "asc",
"nested_path": "openTimes",
"nested_filter": {
"range": {
"openTimes.date": {
"gte": "2018-12-06T00:00:00",
"lte": "2018-12-06T23:59:59"
}
}
}
}
},
"query": {
"nested": {
"path": "openTimes",
"query": {
"bool": {
"filter": {
"range": {
"openTimes.date": {
"gte": "2018-12-06T00:00:00",
"lte": "2018-12-06T23:59:59"
}
}
}
}
}
}
}
}
这里的主要区别是查询略有不同,因为您需要使用 "nested" 查询来过滤嵌套对象。
这给出了以下结果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "nested-listings",
"_type": "listing",
"_id": "vHH6e2cB28sphqox2Dcm",
"_score": null,
"_source": {
"name": "first on the 6th (2nd on the 5th)"
},
"sort": [
1544090400000
]
},
{
"_index": "nested-listings",
"_type": "listing",
"_id": "unH6e2cB28sphqox2Dcm",
"_score": null,
"_source": {
"name": "second on 6th (3rd on the 5th)"
},
"sort": [
1544094000000
]
},
{
"_index": "nested-listings",
"_type": "listing",
"_id": "u3H6e2cB28sphqox2Dcm",
"_score": null,
"_source": {
"name": "third on 6th (1st on the 5th)"
},
"sort": [
1544097600000
]
}
]
}
}
我认为您实际上无法 select ES 中数组中的单个值,因此对于排序,您总是要对所有结果进行排序。您可以对普通数组做的最好的事情是选择如何处理该数组以进行排序(使用最低、最高、平均值等)。