如何在 Elasticsearch 中对嵌套日期和非嵌套日期进行日期运算?
How to perform date arithmetic between nested and unnested dates in Elasticsearch?
考虑以下 Elasticsearch (v5.4) 对象("award" 文档类型):
{
"name": "Gold 1000",
"date": "2017-06-01T16:43:00.000+00:00",
"recipient": {
"name": "James Conroy",
"date_of_birth": "1991-05-30"
}
}
award.date
和 award.recipient.date_of_birth
的映射类型都是 "date"。
我想执行 range aggregation 以获得该奖项获奖者的年龄范围列表("Under 18",“18-24”,“24-30”,“30 +"),他们获奖时。我尝试了以下聚合查询:
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"recipients": {
"nested": {
"path": "recipient"
},
"aggs": {
"age_ranges": {
"range": {
"script": {
"inline": "doc['date'].date - doc['recipient.date_of_birth'].date"
},
"keyed": true,
"ranges": [{
"key": "Under 18",
"from": 0,
"to": 18
}, {
"key": "18-24",
"from": 18,
"to": 24
}, {
"key": "24-30",
"from": 24,
"to": 30
}, {
"key": "30+",
"from": 30,
"to": 100
}]
}
}
}
}
}
}
问题 1
但是由于比较 script
部分中的日期,我得到以下错误:
Cannot apply [-] operation to types [org.joda.time.DateTime] and [org.joda.time.MutableDateTime].
DateTime
对象是award.date
字段,MutableDateTime
对象是award.recipient.date_of_birth
字段。我试过做类似 doc['recipient.date_of_birth'].date.toDateTime()
的事情(尽管 Joda 文档声称 MutableDateTime
具有从父 class 继承的此方法,但它不起作用)。我也试过像这样进一步做一些事情:
"script": "ChronoUnit.YEARS.between(doc['date'].date, doc['recipient.date_of_birth'].date)"
遗憾的是,这也不起作用:(
问题 2
如果我这样做,我会注意到:
"aggs": {
"recipients": {
"nested": {
"path": "recipient"
},
"aggs": {
"award_years": {
"terms": {
"script": {
"inline": "doc['date'].date.year"
}
}
}
}
}
}
我得到 1970
,其中 doc_count
恰好等于 ES 中的文档总数。这让我相信访问嵌套对象外部的 属性 根本不起作用,并且会返回一些默认值,例如纪元日期时间。如果我做相反的事情(聚合出生日期而不嵌套),我会得到所有出生日期的完全相同的结果(1970 年,纪元日期时间)。那么如何比较这两个日期呢?
我在这里绞尽脑汁,我觉得有一些聪明的解决方案超出了我目前对 Elasticsearch 的专业知识。求助!
如果你想为此建立一个快速的环境来帮助我,这里有一些卷曲的好处:
curl -XDELETE http://localhost:9200/joelinux
curl -XPUT http://localhost:9200/joelinux -d "{\"mappings\": {\"award\": {\"properties\": {\"name\": {\"type\": \"string\"}, \"date\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ\"}, \"recipient\": {\"type\": \"nested\", \"properties\": {\"name\": {\"type\": \"string\"}, \"date_of_birth\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd\"}}}}}}}"
curl -XPUT http://localhost:9200/joelinux/award/1 -d '{"name": "Gold 1000", "date": "2016-06-01T16:43:00.000000+00:00", "recipient": {"name": "James Conroy", "date_of_birth": "1991-05-30"}}'
curl -XPUT http://localhost:9200/joelinux/award/2 -d '{"name": "Gold 1000", "date": "2017-02-28T13:36:00.000000+00:00", "recipient": {"name": "Martin McNealy", "date_of_birth": "1983-01-20"}}'
那应该给你一个 "joelinux" 索引和两个 "award" 文档来测试它("James Conroy" 和 "Martin McNealy")。提前致谢!
遗憾的是,您无法在同一上下文中访问嵌套和非嵌套字段。作为解决方法,您可以使用 copy_to
选项将映射更改为自动将日期从嵌套文档复制到根上下文:
{
"mappings": {
"award": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"date": {
"type": "date"
},
"date_of_birth": {
"type": "date" // will be automatically filled when indexing documents
},
"recipient": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"date_of_birth": {
"type": "date",
"copy_to": "date_of_birth" // copy value to root document
}
},
"type": "nested"
}
}
}
}
}
之后您可以使用路径 date
访问出生日期,尽管计算日期之间的年数有点棘手:
Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()
这里我将原始的 JodaTime
日期对象转换为 system.time.LocalDate
对象:
- 从 1970-01-01 获取毫秒数
- 将 1970-01-01 的天数除以 86400000L(一天的毫秒数)
- 转换为
LocalDate
对象
- 从两个日期创建基于日期的
Period
对象
- 获取两个日期之间的年数。
因此,最终的聚合查询如下所示:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_ranges": {
"range": {
"script": {
"inline": "Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()"
},
"keyed": true,
"ranges": [
{
"key": "Under 18",
"from": 0,
"to": 18
},
{
"key": "18-24",
"from": 18,
"to": 24
},
{
"key": "24-30",
"from": 24,
"to": 30
},
{
"key": "30+",
"from": 30,
"to": 100
}
]
}
}
}
}
考虑以下 Elasticsearch (v5.4) 对象("award" 文档类型):
{
"name": "Gold 1000",
"date": "2017-06-01T16:43:00.000+00:00",
"recipient": {
"name": "James Conroy",
"date_of_birth": "1991-05-30"
}
}
award.date
和 award.recipient.date_of_birth
的映射类型都是 "date"。
我想执行 range aggregation 以获得该奖项获奖者的年龄范围列表("Under 18",“18-24”,“24-30”,“30 +"),他们获奖时。我尝试了以下聚合查询:
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"recipients": {
"nested": {
"path": "recipient"
},
"aggs": {
"age_ranges": {
"range": {
"script": {
"inline": "doc['date'].date - doc['recipient.date_of_birth'].date"
},
"keyed": true,
"ranges": [{
"key": "Under 18",
"from": 0,
"to": 18
}, {
"key": "18-24",
"from": 18,
"to": 24
}, {
"key": "24-30",
"from": 24,
"to": 30
}, {
"key": "30+",
"from": 30,
"to": 100
}]
}
}
}
}
}
}
问题 1
但是由于比较 script
部分中的日期,我得到以下错误:
Cannot apply [-] operation to types [org.joda.time.DateTime] and [org.joda.time.MutableDateTime].
DateTime
对象是award.date
字段,MutableDateTime
对象是award.recipient.date_of_birth
字段。我试过做类似 doc['recipient.date_of_birth'].date.toDateTime()
的事情(尽管 Joda 文档声称 MutableDateTime
具有从父 class 继承的此方法,但它不起作用)。我也试过像这样进一步做一些事情:
"script": "ChronoUnit.YEARS.between(doc['date'].date, doc['recipient.date_of_birth'].date)"
遗憾的是,这也不起作用:(
问题 2
如果我这样做,我会注意到:
"aggs": {
"recipients": {
"nested": {
"path": "recipient"
},
"aggs": {
"award_years": {
"terms": {
"script": {
"inline": "doc['date'].date.year"
}
}
}
}
}
}
我得到 1970
,其中 doc_count
恰好等于 ES 中的文档总数。这让我相信访问嵌套对象外部的 属性 根本不起作用,并且会返回一些默认值,例如纪元日期时间。如果我做相反的事情(聚合出生日期而不嵌套),我会得到所有出生日期的完全相同的结果(1970 年,纪元日期时间)。那么如何比较这两个日期呢?
我在这里绞尽脑汁,我觉得有一些聪明的解决方案超出了我目前对 Elasticsearch 的专业知识。求助!
如果你想为此建立一个快速的环境来帮助我,这里有一些卷曲的好处:
curl -XDELETE http://localhost:9200/joelinux
curl -XPUT http://localhost:9200/joelinux -d "{\"mappings\": {\"award\": {\"properties\": {\"name\": {\"type\": \"string\"}, \"date\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ\"}, \"recipient\": {\"type\": \"nested\", \"properties\": {\"name\": {\"type\": \"string\"}, \"date_of_birth\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd\"}}}}}}}"
curl -XPUT http://localhost:9200/joelinux/award/1 -d '{"name": "Gold 1000", "date": "2016-06-01T16:43:00.000000+00:00", "recipient": {"name": "James Conroy", "date_of_birth": "1991-05-30"}}'
curl -XPUT http://localhost:9200/joelinux/award/2 -d '{"name": "Gold 1000", "date": "2017-02-28T13:36:00.000000+00:00", "recipient": {"name": "Martin McNealy", "date_of_birth": "1983-01-20"}}'
那应该给你一个 "joelinux" 索引和两个 "award" 文档来测试它("James Conroy" 和 "Martin McNealy")。提前致谢!
遗憾的是,您无法在同一上下文中访问嵌套和非嵌套字段。作为解决方法,您可以使用 copy_to
选项将映射更改为自动将日期从嵌套文档复制到根上下文:
{
"mappings": {
"award": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"date": {
"type": "date"
},
"date_of_birth": {
"type": "date" // will be automatically filled when indexing documents
},
"recipient": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"date_of_birth": {
"type": "date",
"copy_to": "date_of_birth" // copy value to root document
}
},
"type": "nested"
}
}
}
}
}
之后您可以使用路径 date
访问出生日期,尽管计算日期之间的年数有点棘手:
Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()
这里我将原始的 JodaTime
日期对象转换为 system.time.LocalDate
对象:
- 从 1970-01-01 获取毫秒数
- 将 1970-01-01 的天数除以 86400000L(一天的毫秒数)
- 转换为
LocalDate
对象 - 从两个日期创建基于日期的
Period
对象 - 获取两个日期之间的年数。
因此,最终的聚合查询如下所示:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"age_ranges": {
"range": {
"script": {
"inline": "Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()"
},
"keyed": true,
"ranges": [
{
"key": "Under 18",
"from": 0,
"to": 18
},
{
"key": "18-24",
"from": 18,
"to": 24
},
{
"key": "24-30",
"from": 24,
"to": 30
},
{
"key": "30+",
"from": 30,
"to": 100
}
]
}
}
}
}