Elasticsearch 将文本字段添加到 mt 聚合

Elastic Search adding a text field to my aggregation

我在Elastic Search中有这样的文章信息:

{
   "ArticleId":355027,
   "ArticleNumber":"433398",
   "CharacteristicsMultiValue":[
      {
         "Name":"Aantal cartridges",
         "Value":"4",
         "NumValue":4,
         "Priority":2147483647
      },
      {
         "Name":"ADF",
         "Value":"Ja",
         "Priority":10,
         "Description":"Een Automatic Document Feeder (ADF), of automatische documentinvoer, laat een multifunctionele printer (all-in-one) automatisch meerdere vellen na elkaar verwerken. Door meerdere vellen in de ADF te plaatsen, wordt ieder vel papier stuk voor stuk automatisch gekopieerd of gescand."
      },
      {
         "Name":"Scanresolutie",
         "Value":"600x600 DPI",
         "Priority":2147483647
      }
   ]
}

我 运行 以下查询检索所有出现的 CharacteristicsMultiValue 用于我的搜索以及所有可能的值,并根据我的喜好对它们进行排序。

{
  "query": {
    "query_string": {
     "query": "433398",
     "default_operator": "and"
    }
  },
  "aggs":{
    "CharacteristicsMultiValue":{
      "nested":{
        "path":"CharacteristicsMultiValue"
       },
       "aggs":{
         "Name":{
           "terms":{
            "field":"CharacteristicsMultiValue.Name",
            "size":25
          },
          "aggs":{
            "Value":{
              "terms":{
                "field":"CharacteristicsMultiValue.Value",
                "size":25
              }
            }, 
            "Priority":{
              "avg":{
                "field":"CharacteristicsMultiValue.Priority"
              }
            },
            "Characteristics_sort": {
              "bucket_sort": {
                "sort": [
                  { "Priority": { "order": "asc" } } 
                ]                               
              }
            }       
          }
        }
      }
    }
  }
}

结果显示 CharacteristicsMultiValue 的列表,如下所示。

{
   "key":"ADF",
   "doc_count":1,
   "Priority":{
      "value":10
   },
   "Value":{
      "doc_count_error_upper_bound":0,
      "sum_other_doc_count":0,
      "buckets":[
         {
            "key":"Ja",
            "doc_count":1
         }
      ]
   }
}

这一切都很好。我想进行更改,以便 CharacteristicsMultiValue.Description 字段包含在聚合中。我对 Elastic Search 不是很有经验,但我觉得我应该能够很轻松地做到这一点。

我做了一些研究,据我所知,我需要为描述列添加一个新的子聚合。我试图通过在几个地方将下面的 JSON 添加到我当前的查询中来做到这一点,但我不断收到 404 错误。谁能告诉我如何将(第一个找到的)描述字段添加到我的聚合中。

"aggs":{
    "Description":{
        "terms":{
            "field":"CharacteristicsMultiValue.Description",
            "size":1
        }
    }
}

我测试了乔提出的方案。这会导致以下错误响应:

{ 
  "error": { 
    "root_cause": [ 
      {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "articles_dev1_nl",
        "node": "HiGH6JY9QvOozRSWJmFXpw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    }
  },
  "status": 400
}

我不知道你为什么会收到 404 错误 -- 如果你的聚合语法不正确,通常是 400 Bad Request

无论哪种方式,如果你想找到每个分桶 Value 下的前 Description 个术语,你可以使用:

{
  "query": {
    "query_string": {
      "query": "433398",
      "default_operator": "and"
    }
  },
  "aggs": {
    "CharacteristicsMultiValue": {
      "nested": {
        "path": "CharacteristicsMultiValue"
      },
      "aggs": {
        "Name": {
          "terms": {
            "field": "CharacteristicsMultiValue.Name",
            "size": 25
          },
          "aggs": {
            "Value": {
              "terms": {
                "field": "CharacteristicsMultiValue.Value",
                "size": 25
              },
    -->       "aggs": {
                "Description": {
                  "terms": {
                    "field": "CharacteristicsMultiValue.Description",
                    "size": 1
                  }
                }
              }
            },
            "Priority": {
              "avg": {
                "field": "CharacteristicsMultiValue.Priority"
              }
            },
            "Characteristics_sort": {
              "bucket_sort": {
                "sort": [
                  {
                    "Priority": {
                      "order": "asc"
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}

一般来说,sub-aggregations遵循以下模式:

{
  "query": { },  // optional query
  "aggs": {
    "your_agg_name": {
      "agg_type": {
        // agg spec
      },
      "aggs": {
        "your_sub_agg_name_1": {
          "agg_type": {
            // agg spec
          }
        },
        "your_sub_agg_name_2_if_needed": {
          "agg_type": {
            // agg spec
          }
        },
        ...
      }
    }
  }
}

你可以:

  • 进一步嵌套 sub-aggs 就像您已经在使用我的示例 Name->ValueValue->Description 所做的那样
  • 或者像 Name->ValueName->Priority.
  • 一样将它们保持在 相同水平

提示:您的查询已经嵌套得很深,因此您可以探索 typed_keys query parameter 以更轻松地确定哪个存储桶对应于哪个 sub-aggregation。


编辑

如错误消息中所述,Description 字段需要可聚合 执行任何聚合之前。

因此,如果您删除索引,则应打开 fielddata

PUT articles_dev1_nl
{
  "mappings": {
    "properties": {
      "CharacteristicsMultiValue": {
        "type": "nested",
        "properties": {
          .... other props ...
          
          "Description": {
            "type": "text",
            "fielddata": true        <---
          }
        }
      }
    }
  }
}

或者,如果您的索引已经存在,您可以使用 update API:

PUT articles_dev1_nl/_mapping
{
  "properties": {
    "CharacteristicsMultiValue": {
      "type": "nested",
      "properties": {
        "Description": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}

您可以详细了解 fielddatakeyword here in the docs