Elasticsearch:通过嵌套对象数组进行全文搜索和过滤
Elasticsearch: full-text search and filtering by nested array of objects
有一项任务是制作一个 GUI table,它是基于 PostgreSQL 中 N-join tables 的数据构建的。
此 GUI table 意味着使用全文搜索功能进行排序和过滤。
我想为此目的使用弹性。为 elasticsearch 准备了这个数据结构:
{
did_user_read: true,
view_info: {
total: 1,
users: [
{ name: 'John Smith', read_at: '2020-02-04 11:00:01', is_current_user: false },
{ name: 'Samuel Jackson', read_at: '2020-02-04 11:00:01', is_current_user: true },
],
},
is_favorite: true,
has_attachments: true,
from: {
short_name: 'You',
full_name: 'Chuck Norris',
email: 'ch.norris@example.com',
is_current_user: true
},
subject: 'The secret of the appearance of navel lints',
received_at: '2020-02-04 11:00:01'
}
请告知如何正确索引此结构,以便您可以按嵌套对象和嵌套对象数组进行过滤和搜索?
例如,我想获取符合这些条件的所有记录:
is_favorite IS false
AND
FULL_TEXT_SEARCH("sam jackson")
BY FIELDS
users.name, -- inside of array(!)
from.full_name,
from.short_name
AND
users.is_current_user IS NOT false
AND
ORDER BY received_at DESC
上述数据结构的 elasticsearch 索引映射应该是:
映射
{
"mappings": {
"properties": {
"did_user_read": {
"type": "boolean"
},
"view_info": {
"properties": {
"total": {
"type": "integer"
},
"users": {
"properties": {
"name": {
"type": "text"
},
"read_at": {
"type": "date",
"format": "date_hour_minute_second"
},
"is_current_user": {
"type": "boolean"
}
}
}
}
},
"is_favorite": {
"type": "boolean"
},
"has_attachments": {
"type": "boolean"
},
"from": {
"properties": {
"short_name": {
"type": "text"
},
"full_name": {
"type": "text"
},
"email": {
"type": "keyword"
},
"is_current_user": {
"type": "boolean"
}
}
},
"subject": {
"type": "text"
},
"received_at": {
"type": "date",
"format": "date_hour_minute_second"
}
}
}
}
现在我已经索引了一些与您在示例中给出的格式相同的文档。
基于询问条件的搜索查询应该是:
搜索查询:
{
"query": {
"bool": {
"filter": [
{
"term": {
"is_favorite": false
}
},
{
"term": {
"view_info.users.is_current_user": true
}
}
],
"must": {
"multi_match": {
"query": "sam jackson",
"fields": [
"view_info.users.name",
"from.full_name",
"from.short_name"
]
}
}
}
},
"sort": [
{
"received_at": {
"order": "desc"
}
}
]
}
输出
"hits": [
{
"_index": "topics",
"_type": "_doc",
"_id": "3",
"_score": null,
"_source": {
"did_user_read": true,
"view_info": {
"total": 1,
"users": [
{
"name": "John Smith",
"read_at": "2020-02-04T11:00:01",
"is_current_user": false
},
{
"name": "Samuel Jackson",
"read_at": "2020-02-04T11:00:01",
"is_current_user": true
}
]
},
"is_favorite": false,
"has_attachments": true,
"from": {
"short_name": "You",
"full_name": "Chuck Norris",
"email": "ch.norris@example.com",
"is_current_user": true
},
"subject": "The secret of the appearance of navel lints",
"received_at": "2020-02-04T11:00:03"
},
"sort": [
1580814003000
]
},
{
"_index": "topics",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"did_user_read": true,
"view_info": {
"total": 1,
"users": [
{
"name": "John Smith",
"read_at": "2020-02-04T11:00:01",
"is_current_user": false
},
{
"name": "Samuel Jackson",
"read_at": "2020-02-04T11:00:01",
"is_current_user": true
}
]
},
"is_favorite": false,
"has_attachments": true,
"from": {
"short_name": "You",
"full_name": "Chuck Norris",
"email": "ch.norris@example.com",
"is_current_user": true
},
"subject": "The secret of the appearance of navel lints",
"received_at": "2020-02-04T11:00:01"
},
"sort": [
1580814001000
]
}
]
解释:
根据您的查询,搜索查询的构造方式如下:
is_favorite IS false and users.is_current_user IS NOT false
这是在 filter
查询的帮助下完成的。当我们希望我们的文档满足某些条件但它们对计算搜索文档的分数没有贡献时,使用过滤器。现在,由于两个查询字段都是布尔值,因此它们不会对分数的计算做出贡献,因为答案是是或否。
FULL_TEXT_SEARCH("sam jackson")
BY FIELDS
users.name, -- inside of array(!)
from.full_name,
from.short_name
这里我们要搜索 sam jackson
它们应该在所有 3 个字段中所以
match_phrase
被使用。
这三个条件保留在 bool
过滤器中,因为有 AND
条件连接它们
ORDER BY received_at DESC
为此 sort
使用查询
注意 :您必须更改存在日期时间的数据,例如 read_at、received_at。目前您采用的格式为 2020-02-04 11:00:01 。您只需要稍微更改一下,以便在 elasticsearch 中索引文档时采用格式 2020-02-04T11:00:01(而不是 space 使用 T),因为 elasticsearch 仅接受一组日期时间格式。您可以在此处参考日期时间接受的格式 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html
有一项任务是制作一个 GUI table,它是基于 PostgreSQL 中 N-join tables 的数据构建的。 此 GUI table 意味着使用全文搜索功能进行排序和过滤。
我想为此目的使用弹性。为 elasticsearch 准备了这个数据结构:
{
did_user_read: true,
view_info: {
total: 1,
users: [
{ name: 'John Smith', read_at: '2020-02-04 11:00:01', is_current_user: false },
{ name: 'Samuel Jackson', read_at: '2020-02-04 11:00:01', is_current_user: true },
],
},
is_favorite: true,
has_attachments: true,
from: {
short_name: 'You',
full_name: 'Chuck Norris',
email: 'ch.norris@example.com',
is_current_user: true
},
subject: 'The secret of the appearance of navel lints',
received_at: '2020-02-04 11:00:01'
}
请告知如何正确索引此结构,以便您可以按嵌套对象和嵌套对象数组进行过滤和搜索?
例如,我想获取符合这些条件的所有记录:
is_favorite IS false
AND
FULL_TEXT_SEARCH("sam jackson")
BY FIELDS
users.name, -- inside of array(!)
from.full_name,
from.short_name
AND
users.is_current_user IS NOT false
AND
ORDER BY received_at DESC
上述数据结构的 elasticsearch 索引映射应该是:
映射
{
"mappings": {
"properties": {
"did_user_read": {
"type": "boolean"
},
"view_info": {
"properties": {
"total": {
"type": "integer"
},
"users": {
"properties": {
"name": {
"type": "text"
},
"read_at": {
"type": "date",
"format": "date_hour_minute_second"
},
"is_current_user": {
"type": "boolean"
}
}
}
}
},
"is_favorite": {
"type": "boolean"
},
"has_attachments": {
"type": "boolean"
},
"from": {
"properties": {
"short_name": {
"type": "text"
},
"full_name": {
"type": "text"
},
"email": {
"type": "keyword"
},
"is_current_user": {
"type": "boolean"
}
}
},
"subject": {
"type": "text"
},
"received_at": {
"type": "date",
"format": "date_hour_minute_second"
}
}
}
}
现在我已经索引了一些与您在示例中给出的格式相同的文档。
基于询问条件的搜索查询应该是:
搜索查询:
{
"query": {
"bool": {
"filter": [
{
"term": {
"is_favorite": false
}
},
{
"term": {
"view_info.users.is_current_user": true
}
}
],
"must": {
"multi_match": {
"query": "sam jackson",
"fields": [
"view_info.users.name",
"from.full_name",
"from.short_name"
]
}
}
}
},
"sort": [
{
"received_at": {
"order": "desc"
}
}
]
}
输出
"hits": [
{
"_index": "topics",
"_type": "_doc",
"_id": "3",
"_score": null,
"_source": {
"did_user_read": true,
"view_info": {
"total": 1,
"users": [
{
"name": "John Smith",
"read_at": "2020-02-04T11:00:01",
"is_current_user": false
},
{
"name": "Samuel Jackson",
"read_at": "2020-02-04T11:00:01",
"is_current_user": true
}
]
},
"is_favorite": false,
"has_attachments": true,
"from": {
"short_name": "You",
"full_name": "Chuck Norris",
"email": "ch.norris@example.com",
"is_current_user": true
},
"subject": "The secret of the appearance of navel lints",
"received_at": "2020-02-04T11:00:03"
},
"sort": [
1580814003000
]
},
{
"_index": "topics",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"did_user_read": true,
"view_info": {
"total": 1,
"users": [
{
"name": "John Smith",
"read_at": "2020-02-04T11:00:01",
"is_current_user": false
},
{
"name": "Samuel Jackson",
"read_at": "2020-02-04T11:00:01",
"is_current_user": true
}
]
},
"is_favorite": false,
"has_attachments": true,
"from": {
"short_name": "You",
"full_name": "Chuck Norris",
"email": "ch.norris@example.com",
"is_current_user": true
},
"subject": "The secret of the appearance of navel lints",
"received_at": "2020-02-04T11:00:01"
},
"sort": [
1580814001000
]
}
]
解释:
根据您的查询,搜索查询的构造方式如下:
is_favorite IS false and users.is_current_user IS NOT false
这是在
filter
查询的帮助下完成的。当我们希望我们的文档满足某些条件但它们对计算搜索文档的分数没有贡献时,使用过滤器。现在,由于两个查询字段都是布尔值,因此它们不会对分数的计算做出贡献,因为答案是是或否。FULL_TEXT_SEARCH("sam jackson") BY FIELDS users.name, -- inside of array(!) from.full_name, from.short_name
这里我们要搜索
sam jackson
它们应该在所有 3 个字段中所以match_phrase
被使用。
这三个条件保留在 bool
过滤器中,因为有 AND
条件连接它们
ORDER BY received_at DESC
为此
sort
使用查询
注意 :您必须更改存在日期时间的数据,例如 read_at、received_at。目前您采用的格式为 2020-02-04 11:00:01 。您只需要稍微更改一下,以便在 elasticsearch 中索引文档时采用格式 2020-02-04T11:00:01(而不是 space 使用 T),因为 elasticsearch 仅接受一组日期时间格式。您可以在此处参考日期时间接受的格式 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html