带有嵌套集的 Elasticsearch 查询

Elasticsearch query with nested sets

我是 Elasticsearch 的新手,所以请耐心等待,如果我需要提供任何其他信息,请告诉我。我继承了一个项目,需要实现新的搜索功能。 document/mapping 结构已经到位,但如果它不能促进我想要实现的目标,则可以更改。我正在使用 Elasticsearch 版本 5.6.16。

一家公司能够提供多种服务。每个服务产品都组合在一个集合中。每组都是3类作曲家;

文档结构看起来像;

[{
  "id": 4485,
  "name": "Company A",
  // ...
  "services": {
    "595": {
      "1": [
        95, 97, 91
      ],
      "3": [
        475, 476, 471
      ],
      "4": [
        644, 645, 683
      ]
    },
    "596": {
      "1": [
        91, 89, 76
      ],
      "3": [
        476, 476, 301
      ],
      "4": [
        644, 647, 555
      ]
    },
    "597": {
      "1": [
        92, 93, 89
      ],
      "3": [
        473, 472, 576
      ],
      "4": [
        641, 645, 454
      ]
    },
  }
}]

在上面的例子中; 595、596 和 597 是与集合相关的 ID。 1、3 和 4 涉及类别(如上所述)。

映射看起来像;

[{
  "id": {
    "type": "long"
  },
  "name": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  },
  "services": {
    "properties": {
      // ...
      "595": {
        "properties": {
          "1": {"type": "long"},
          "3": {"type": "long"},
          "4": {"type": "long"}
        }
      },
      "596": {
        "properties": {
          "1": {"type": "long"},
          "3": {"type": "long"},
          "4": {"type": "long"}
        }
      },
      // ...
    }
  },
}]

在搜索提供产品 (ID 1) 的公司时 - 搜索 91 和 95 将 return 公司 A,因为这些 ID 在同一组中。但是,如果我要搜索 95 和 76,它不会 return 公司 A - 虽然该公司确实生产这两种产品,但它们不在同一个系列中。这些相同的规则将适用于搜索流程和 Material 或这些的组合。

我正在寻找确认当前 document/mapping 结构将促进此类搜索。

感谢您的帮助。

ID 作为值显示为 field 本身是一个坏主意,因为这可能会导致创建如此多的倒排索引,(请记住,在 Elasticsearch 中,倒排索引是在每个字段上创建的),我觉得有这样的东西是不合理的。

而是将您的数据模型更改为如下所示。我还提供了示例文档、您可以应用的可能查询以及响应的显示方式。

请注意,为了简单起见,我只关注您在映射中提到的 services 字段。

映射:

PUT my_services_index
{
  "mappings": {
    "properties": {
      "services":{
        "type": "nested",                   <----- Note this
        "properties": {
          "service_key":{
            "type": "keyword"               <----- Note that I have mentioned keyword here. Feel free to use text and keyword if you plan to implement partial + exact search.
          },
          "product_key": {
            "type": "keyword"
          },
          "product_values": {
            "type": "keyword"
          },
          "process_key":{
            "type": "keyword"
          },
          "process_values":{
            "type": "keyword"
          },
          "material_key":{
            "type": "keyword"
          },
          "material_values":{
            "type": "keyword"
          }
        }
      }
    }
  }
}

请注意,我使用了 nested datatype. I'd suggest you to go through that link to understand why do we need that instead of using plain object 类型。

示例文档:

POST my_services_index/_doc/1
{
  "services":[
  {
    "service_key": "595",
    "process_key": "1",
    "process_values": ["95", "97", "91"],
    "product_key": "3",
    "product_values": ["475", "476", "471"],
    "material_key": "4",
    "material_values": ["644", "645", "643"]
  },
  {
    "service_key": "596",
    "process_key": "1",
    "process_values": ["91", "89", "75"],
    "product_key": "3",
    "product_values": ["476", "476", "301"],
    "material_key": "4",
    "material_values": ["644", "647", "555"]
  }
    ]
}

注意你现在如何管理你的数据,如果它最终有多个组合或product_key, process_key and material_key

您对上述文档的解释方式是,在 my_services_index.

的文档中有两个嵌套文档

示例查询:

POST my_services_index/_search
{
  "_source": "services.service_key", 
  "query": {
    "bool": {
      "must": [
        {
          "nested": {                                      <---- Note this
            "path": "services",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "services.service_key": "595"
                    }
                  },
                  {
                    "term": {
                      "services.process_key": "1"
                    }
                  },
                  {
                    "term": {
                      "services.process_values": "95"
                    }
                  }
                ]
              }
            },
            "inner_hits": {}                              <---- Note this
          }
        }
      ]
    }
  }
}

请注意,我使用了 Nested Query

回复:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.828546,
    "hits" : [                              <---- Note this. Which would return the original document. 
      {
        "_index" : "my_services_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.828546,
        "_source" : {
          "services" : [
            {
              "service_key" : "595",
              "process_key" : "1",
              "process_values" : [
                "95",
                "97",
                "91"
              ],
              "product_key" : "3",
              "product_values" : [
                "475",
                "476",
                "471"
              ],
              "material_key" : "4",
              "material_values" : [
                "644",
                "645",
                "643"
              ]
            },
            {
              "service_key" : "596",
              "process_key" : "1",
              "process_values" : [
                "91",
                "89",
                "75"
              ],
              "product_key" : "3",
              "product_values" : [
                "476",
                "476",
                "301"
              ],
              "material_key" : "4",
              "material_values" : [
                "644",
                "647",
                "555"
              ]
            }
          ]
        },
        "inner_hits" : {                    <--- Note this, which would tell you which inner document has been a hit. 
          "services" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 1.828546,
              "hits" : [
                {
                  "_index" : "my_services_index",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "services",
                    "offset" : 0
                  },
                  "_score" : 1.828546,
                  "_source" : {
                    "service_key" : "595",
                    "process_key" : "1",
                    "process_values" : [
                      "95",
                      "97",
                      "91"
                    ],
                    "product_key" : "3",
                    "product_values" : [
                      "475",
                      "476",
                      "471"
                    ],
                    "material_key" : "4",
                    "material_values" : [
                      "644",
                      "645",
                      "643"
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

请注意,我使用了 keyword 数据类型。请随意使用数据类型以及您对所有字段的业务需求。

我提供的想法是为了帮助您理解文档模型。

希望对您有所帮助!