带有嵌套集的 Elasticsearch 查询
Elasticsearch query with nested sets
我是 Elasticsearch 的新手,所以请耐心等待,如果我需要提供任何其他信息,请告诉我。我继承了一个项目,需要实现新的搜索功能。 document/mapping 结构已经到位,但如果它不能促进我想要实现的目标,则可以更改。我正在使用 Elasticsearch 版本 5.6.16。
一家公司能够提供多种服务。每个服务产品都组合在一个集合中。每组都是3类作曲家;
- 产品(ID 1)
- 进程(ID 3)
- Material(s) (ID 4)
文档结构看起来像;
[{
"id": 4485,
"name": "Company A",
// ...
"services": {
"595": {
"1": [
95, 97, 91
],
"3": [
475, 476, 471
],
"4": [
644, 645, 683
]
},
"596": {
"1": [
91, 89, 76
],
"3": [
476, 476, 301
],
"4": [
644, 647, 555
]
},
"597": {
"1": [
92, 93, 89
],
"3": [
473, 472, 576
],
"4": [
641, 645, 454
]
},
}
}]
在上面的例子中; 595、596 和 597 是与集合相关的 ID。 1、3 和 4 涉及类别(如上所述)。
映射看起来像;
[{
"id": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"services": {
"properties": {
// ...
"595": {
"properties": {
"1": {"type": "long"},
"3": {"type": "long"},
"4": {"type": "long"}
}
},
"596": {
"properties": {
"1": {"type": "long"},
"3": {"type": "long"},
"4": {"type": "long"}
}
},
// ...
}
},
}]
在搜索提供产品 (ID 1) 的公司时 - 搜索 91 和 95 将 return 公司 A,因为这些 ID 在同一组中。但是,如果我要搜索 95 和 76,它不会 return 公司 A - 虽然该公司确实生产这两种产品,但它们不在同一个系列中。这些相同的规则将适用于搜索流程和 Material 或这些的组合。
我正在寻找确认当前 document/mapping 结构将促进此类搜索。
- 如果是这样,给定 3 个 ID 数组(产品、流程和 Materials),JSON 找到同一组中提供这些服务的所有公司的方法是什么?
- 如果不是,应该如何更改 document/mapping 以允许此搜索?
感谢您的帮助。
将 ID
作为值显示为 field
本身是一个坏主意,因为这可能会导致创建如此多的倒排索引,(请记住,在 Elasticsearch 中,倒排索引是在每个字段上创建的),我觉得有这样的东西是不合理的。
而是将您的数据模型更改为如下所示。我还提供了示例文档、您可以应用的可能查询以及响应的显示方式。
请注意,为了简单起见,我只关注您在映射中提到的 services
字段。
映射:
PUT my_services_index
{
"mappings": {
"properties": {
"services":{
"type": "nested", <----- Note this
"properties": {
"service_key":{
"type": "keyword" <----- Note that I have mentioned keyword here. Feel free to use text and keyword if you plan to implement partial + exact search.
},
"product_key": {
"type": "keyword"
},
"product_values": {
"type": "keyword"
},
"process_key":{
"type": "keyword"
},
"process_values":{
"type": "keyword"
},
"material_key":{
"type": "keyword"
},
"material_values":{
"type": "keyword"
}
}
}
}
}
}
请注意,我使用了 nested datatype. I'd suggest you to go through that link to understand why do we need that instead of using plain object
类型。
示例文档:
POST my_services_index/_doc/1
{
"services":[
{
"service_key": "595",
"process_key": "1",
"process_values": ["95", "97", "91"],
"product_key": "3",
"product_values": ["475", "476", "471"],
"material_key": "4",
"material_values": ["644", "645", "643"]
},
{
"service_key": "596",
"process_key": "1",
"process_values": ["91", "89", "75"],
"product_key": "3",
"product_values": ["476", "476", "301"],
"material_key": "4",
"material_values": ["644", "647", "555"]
}
]
}
注意你现在如何管理你的数据,如果它最终有多个组合或product_key, process_key and material_key
。
您对上述文档的解释方式是,在 my_services_index
.
的文档中有两个嵌套文档
示例查询:
POST my_services_index/_search
{
"_source": "services.service_key",
"query": {
"bool": {
"must": [
{
"nested": { <---- Note this
"path": "services",
"query": {
"bool": {
"must": [
{
"term": {
"services.service_key": "595"
}
},
{
"term": {
"services.process_key": "1"
}
},
{
"term": {
"services.process_values": "95"
}
}
]
}
},
"inner_hits": {} <---- Note this
}
}
]
}
}
}
请注意,我使用了 Nested Query。
回复:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.828546,
"hits" : [ <---- Note this. Which would return the original document.
{
"_index" : "my_services_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.828546,
"_source" : {
"services" : [
{
"service_key" : "595",
"process_key" : "1",
"process_values" : [
"95",
"97",
"91"
],
"product_key" : "3",
"product_values" : [
"475",
"476",
"471"
],
"material_key" : "4",
"material_values" : [
"644",
"645",
"643"
]
},
{
"service_key" : "596",
"process_key" : "1",
"process_values" : [
"91",
"89",
"75"
],
"product_key" : "3",
"product_values" : [
"476",
"476",
"301"
],
"material_key" : "4",
"material_values" : [
"644",
"647",
"555"
]
}
]
},
"inner_hits" : { <--- Note this, which would tell you which inner document has been a hit.
"services" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.828546,
"hits" : [
{
"_index" : "my_services_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "services",
"offset" : 0
},
"_score" : 1.828546,
"_source" : {
"service_key" : "595",
"process_key" : "1",
"process_values" : [
"95",
"97",
"91"
],
"product_key" : "3",
"product_values" : [
"475",
"476",
"471"
],
"material_key" : "4",
"material_values" : [
"644",
"645",
"643"
]
}
}
]
}
}
}
}
]
}
}
请注意,我使用了 keyword
数据类型。请随意使用数据类型以及您对所有字段的业务需求。
我提供的想法是为了帮助您理解文档模型。
希望对您有所帮助!
我是 Elasticsearch 的新手,所以请耐心等待,如果我需要提供任何其他信息,请告诉我。我继承了一个项目,需要实现新的搜索功能。 document/mapping 结构已经到位,但如果它不能促进我想要实现的目标,则可以更改。我正在使用 Elasticsearch 版本 5.6.16。
一家公司能够提供多种服务。每个服务产品都组合在一个集合中。每组都是3类作曲家;
- 产品(ID 1)
- 进程(ID 3)
- Material(s) (ID 4)
文档结构看起来像;
[{
"id": 4485,
"name": "Company A",
// ...
"services": {
"595": {
"1": [
95, 97, 91
],
"3": [
475, 476, 471
],
"4": [
644, 645, 683
]
},
"596": {
"1": [
91, 89, 76
],
"3": [
476, 476, 301
],
"4": [
644, 647, 555
]
},
"597": {
"1": [
92, 93, 89
],
"3": [
473, 472, 576
],
"4": [
641, 645, 454
]
},
}
}]
在上面的例子中; 595、596 和 597 是与集合相关的 ID。 1、3 和 4 涉及类别(如上所述)。
映射看起来像;
[{
"id": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"services": {
"properties": {
// ...
"595": {
"properties": {
"1": {"type": "long"},
"3": {"type": "long"},
"4": {"type": "long"}
}
},
"596": {
"properties": {
"1": {"type": "long"},
"3": {"type": "long"},
"4": {"type": "long"}
}
},
// ...
}
},
}]
在搜索提供产品 (ID 1) 的公司时 - 搜索 91 和 95 将 return 公司 A,因为这些 ID 在同一组中。但是,如果我要搜索 95 和 76,它不会 return 公司 A - 虽然该公司确实生产这两种产品,但它们不在同一个系列中。这些相同的规则将适用于搜索流程和 Material 或这些的组合。
我正在寻找确认当前 document/mapping 结构将促进此类搜索。
- 如果是这样,给定 3 个 ID 数组(产品、流程和 Materials),JSON 找到同一组中提供这些服务的所有公司的方法是什么?
- 如果不是,应该如何更改 document/mapping 以允许此搜索?
感谢您的帮助。
将 ID
作为值显示为 field
本身是一个坏主意,因为这可能会导致创建如此多的倒排索引,(请记住,在 Elasticsearch 中,倒排索引是在每个字段上创建的),我觉得有这样的东西是不合理的。
而是将您的数据模型更改为如下所示。我还提供了示例文档、您可以应用的可能查询以及响应的显示方式。
请注意,为了简单起见,我只关注您在映射中提到的 services
字段。
映射:
PUT my_services_index
{
"mappings": {
"properties": {
"services":{
"type": "nested", <----- Note this
"properties": {
"service_key":{
"type": "keyword" <----- Note that I have mentioned keyword here. Feel free to use text and keyword if you plan to implement partial + exact search.
},
"product_key": {
"type": "keyword"
},
"product_values": {
"type": "keyword"
},
"process_key":{
"type": "keyword"
},
"process_values":{
"type": "keyword"
},
"material_key":{
"type": "keyword"
},
"material_values":{
"type": "keyword"
}
}
}
}
}
}
请注意,我使用了 nested datatype. I'd suggest you to go through that link to understand why do we need that instead of using plain object
类型。
示例文档:
POST my_services_index/_doc/1
{
"services":[
{
"service_key": "595",
"process_key": "1",
"process_values": ["95", "97", "91"],
"product_key": "3",
"product_values": ["475", "476", "471"],
"material_key": "4",
"material_values": ["644", "645", "643"]
},
{
"service_key": "596",
"process_key": "1",
"process_values": ["91", "89", "75"],
"product_key": "3",
"product_values": ["476", "476", "301"],
"material_key": "4",
"material_values": ["644", "647", "555"]
}
]
}
注意你现在如何管理你的数据,如果它最终有多个组合或product_key, process_key and material_key
。
您对上述文档的解释方式是,在 my_services_index
.
示例查询:
POST my_services_index/_search
{
"_source": "services.service_key",
"query": {
"bool": {
"must": [
{
"nested": { <---- Note this
"path": "services",
"query": {
"bool": {
"must": [
{
"term": {
"services.service_key": "595"
}
},
{
"term": {
"services.process_key": "1"
}
},
{
"term": {
"services.process_values": "95"
}
}
]
}
},
"inner_hits": {} <---- Note this
}
}
]
}
}
}
请注意,我使用了 Nested Query。
回复:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.828546,
"hits" : [ <---- Note this. Which would return the original document.
{
"_index" : "my_services_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.828546,
"_source" : {
"services" : [
{
"service_key" : "595",
"process_key" : "1",
"process_values" : [
"95",
"97",
"91"
],
"product_key" : "3",
"product_values" : [
"475",
"476",
"471"
],
"material_key" : "4",
"material_values" : [
"644",
"645",
"643"
]
},
{
"service_key" : "596",
"process_key" : "1",
"process_values" : [
"91",
"89",
"75"
],
"product_key" : "3",
"product_values" : [
"476",
"476",
"301"
],
"material_key" : "4",
"material_values" : [
"644",
"647",
"555"
]
}
]
},
"inner_hits" : { <--- Note this, which would tell you which inner document has been a hit.
"services" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.828546,
"hits" : [
{
"_index" : "my_services_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "services",
"offset" : 0
},
"_score" : 1.828546,
"_source" : {
"service_key" : "595",
"process_key" : "1",
"process_values" : [
"95",
"97",
"91"
],
"product_key" : "3",
"product_values" : [
"475",
"476",
"471"
],
"material_key" : "4",
"material_values" : [
"644",
"645",
"643"
]
}
}
]
}
}
}
}
]
}
}
请注意,我使用了 keyword
数据类型。请随意使用数据类型以及您对所有字段的业务需求。
我提供的想法是为了帮助您理解文档模型。
希望对您有所帮助!