Select 当至少一个对象不包含必需字段时,按对象数组记录文档 Elasticsearch

Select documents by array of objects when at least one object doesn't contain necessary field Elasticsearch

我在 elasticsearch 中有文档,但无法理解如何应用应该 return 文档的搜索脚本,如果 attachment 不包含 uuid 或者 uuid 是无效的。弹性5.2版本。 文档映射

"mappings": {
    "documentType": {
        "properties": {
            "attachment": {
                "properties": {
                    "uuid": {
                        "type": "text"
                    },
                    "path": {
                        "type": "text"
                    },
                    "size": {
                        "type": "long"
                    }
                }
            }}}

在 elasticsearch 中它看起来像

{
        "_index": "documents",
        "_type": "documentType",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "21321321",
                "path": "../uploads/somepath",
                "size":1231
               },
               {
                "path": "../uploads/somepath",
                "size":1231
               },      
         ]},
{
        "_index": "documents",
        "_type": "documentType",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "223645641321321",
                "path": "../uploads/somepath",
                "size":1231
               },
               {
                "uuid": "22341424321321",
                "path": "../uploads/somepath",
                "size":1231
               },        
         ]},
{
        "_index": "documents",
        "_type": "documentType",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "22789789341321321",
                "path": "../uploads/somepath",
                "size":1231
               }, 
               {
                "path": "../uploads/somepath",
                "size":1231
               },      
         ]}

因此,我想获取 _id 为 1 和 3 的附件。但结果我得到了脚本错误 我尝试应用下一个脚本:

{
    "query": {
        "bool": {
            "must": [
                {
                    "exists": {
                        "field": "attachment"
                    }
                },
                {
                    "script": {
                        "script": {
                            "inline": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
                            "lang": "painless"
                        }
                    }
                }
            ]
        }
    }
}

接下来是错误:

 "root_cause": [
            {
                "type": "script_exception",
                "reason": "runtime error",
                "script_stack": [
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:77)",
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:36)",
                    "for (item in doc['attachment'].value) { ",
                    "                 ^---- HERE"
                ],
                "script": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
                "lang": "painless"
            }
        ],

如果一个附件对象不包含 uuid,是否可以 select 文档?

迭代对象数组并不像人们想象的那么简单。我已经写了很多关于它的文章 and here.

由于您的 attachments 未定义为 nested,ES 将在内部将它们表示为扁平化的值列表(也称为“文档值”)。例如doc#2中的attachment.uuid会变成["223645641321321", "22341424321321"]attachments.size会变成[1231, 1231].

这意味着您可以简单地比较这些扁平化表示的.length!我假设 attachment.size 总是 存在,因此可以作为比较基线。

还有一件事。要利用文本字段的这些优化文档值,它将 require one small mapping change:

PUT documents/documentType/_mappings
{
  "properties": {
    "attachment": {
      "properties": {
        "uuid": {
          "type": "text",
          "fielddata": true     <---
        },
        "path": {
          "type": "text"
        },
        "size": {
          "type": "long"
        }
      }
    }
  }
}

完成后,您已经重新索引了您的文档 — 这可以用这个小 Update by query trick:

来完成
POST documents/_update_by_query

然后您可以使用以下脚本查询:

POST documents/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "attachment"
          }
        },
        {
          "script": {
            "script": {
              "inline": "def size_field_length = doc['attachment.size'].length; def uuid_field_length =  doc['attachment.uuid'].length; return uuid_field_length < size_field_length",
              "lang": "painless"
            }
          }
        }
      ]
    }
  }
}

补充一下。如果 uuid 字段的映射是自动创建的,弹性搜索会以这种方式添加它:

"uuid": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
    }
}

然后脚本可能如下所示:

POST documents/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "exists": {
                        "field": "attachment"
                    }
                },
                {
                    "script": {
                        "script": {
                            "inline": "doc['attachment.size'].length > doc['attachment.uuid.keyword'].length",
                            "lang": "painless"
                        }
                    }
                }
            ]
        }
    }
}