如何将嵌套对象中的字段从嵌套对象移出到弹性索引中的单独对象中

How to move fields from nested object out from the nested object into separate objects in an elastic index

我的索引中有一个包含多个对象的嵌套字段。

 "customFields" : [
            {
              "objectTypeId" : 17,
              "Value" : "",
              "description" : "The original author of the document",
              "Name" : "Document Author"
            },
            {
              "objectTypeId" : 17,
              "Value" : "",
              "description" : "Source document number",
              "Name" : "Legacy document number"
            },
.
.
.
]

我想创建一个脚本,可以将字段从 customFields 对象移出到单独的对象中,如下所示:

"Document_Author": {
"Description": "The original author of the document",
"Value": "Some value"
"ObjectTypeId": 17
},

"Legacy document number": {
"Description": "Source document number",
"Value": "Some value"
"ObjectTypeId": 17
},
.
.
.

我试过这样的脚本,我对弹性搜索和脚本编写还很陌生,所以这行不通。

POST /new_document-20/_update_by_query
 {
  "script" : { "inline": "for (int i = 0; i < ctx._source.customFields.length; ++i) { ctx._source.add(\"customFields[i].Name\" : { \"Value\" : \"customFields[i].Value\", \"Description\" : \"customFields[i].description\", \"objectTypeId\" : \"customFields[i].objectTypeId\"}) }",
 
       "query": {
         "bool": {
           "must": [
             {
               "exists": {
                 "field": "customFields.Name"
          }
        }
      ]
    }
  }
  }
}

我从这个指向 customFields[i].Name 得到编译错误,像这样:

"error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "compile error",
        "script_stack": [
          "... d(\"customFields[i].Name\" : { \"Value\" : \"customFiel ...",
          "                             ^---- HERE"

如何创建一个脚本来帮助我将字段从嵌套对象中移出?

您可以执行只有一个 ctx._source 写入操作per loop 以防止"The maximum number of statements that can be executed in a loop has been reached." 错误。

话虽如此,我建议:

  1. 复制原文_source
  2. 提取 customFields 列表
  3. 迭代提取的列表并调整哈希映射以符合所需格式
  4. 将新形成的哈希映射设置到复制的source
  5. 完全替换原来的_source

实际上:

POST /new_document-20/_update_by_query
{
  "script": {
    "inline": """
      def source_copy = ctx._source;
      def customFields = source_copy.remove('customFields');
      
      for (int i = 0; i < customFields.length; i++) {
        // store the current iteratee
        def current = customFields[i];
        
        // remove AND return the name
        def name = current.remove('Name');
        
        // set in the _source
        source_copy[name] = current;
      }
      
      // replace the original source completely
      ctx._source = source_copy;
    """,
    "query": {
      "bool": {
        "must": [
          {
            "exists": {
              "field": "customFields.Name"
            }
          }
        ]
      }
    }
  }
}

作为内联脚本字符串:

"\n      def source_copy = ctx._source;\n      def customFields = source_copy.remove('customFields');\n      \n      for (int i = 0; i < customFields.length; i++) {\n        // store the current iteratee\n        def current = customFields[i];\n        \n        // remove AND return the name\n        def name = current.remove('Name');\n        \n        // set in the _source\n        source_copy[name] = current;\n      }\n      \n      // replace the original source completely\n      ctx._source = source_copy;\n    "

顺便说一下,hash maps in Painless 是通过 new HashMap 调用或通过(有点混乱的)[:] 运算符实例化的,即:

def entries_map_without_name = [
   "Value" : current.Value, 
   "Description" : current.description,
   "objectTypeId" : current.objectTypeId
];

P.S。从嵌套对象列表到一堆哈希映射的转换 您尝试执行的操作有其优点和缺点,尤其是。当涉及到映射大小膨胀和非常有限的聚合可能性时。

不要脸的外挂--我讨论只是我的Elasticsearch Handbook, specifically in this sub-chapter.