弹性搜索过滤包含空字符串数组的文档

Elastic search filter documents that contain array with empty string

我在弹性搜索中有文档,我想过滤掉只包含空字符串数组或什么都没有/空数组的文档。

#doc 1
{
  "_index": "my-index-000001",
  "_type": "_doc",
  "_id": "0",
  "_source": {
    "doc":{
        "field": ["",""]
    }
  }
}

#doc 2
{
  "_index": "my-index-000001",
  "_type": "_doc",
  "_id": "0",
  "_source": {
    "doc":{
        "field": []
    }
  }
}

#doc 3
{
  "_index": "my-index-000001",
  "_type": "_doc",
  "_id": "0",
  "_source": {
    "doc":{
        "field": ["hello",""]
    }
  }
}

从上述文档中是否可以仅过滤掉 doc 1 和 doc 2 至于这些,“字段”在数组中不包含任何内容或仅包含空字符串。

请检查下面的查询,它将 return 仅包含空数组或包含所有空字符串的数组的文档。

这里第一个 should 子句将检查空字符串是否是数组的一部分,第二个子句将检查数组字段是否不存在,must_not 与通配符将从结果中删除至少有一个元素的文档数组。

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "city.keyword": {
              "value": ""
            }
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "exists": {
                  "field": "city.keyword"
                }
              }
            ]
          }
        }
      ],
      "must_not": [
        {
          "wildcard": {
            "city.keyword": "?*"
          }
        }
      ]
    }
  }
}

下面是我索引中的示例文档:

{
"hits" : [
      {
        "_index" : "arrayindex",
        "_type" : "_doc",
        "_id" : "4g3P2H4BrzeQ9ErqJwUL",
        "_score" : 1.0,
        "_source" : {
          "city" : [
            "",
            ""
          ]
        }
      },
      {
        "_index" : "arrayindex",
        "_type" : "_doc",
        "_id" : "4w3P2H4BrzeQ9ErqXgWT",
        "_score" : 1.0,
        "_source" : {
          "city" : [ ]
        }
      },
      {
        "_index" : "arrayindex",
        "_type" : "_doc",
        "_id" : "5A3P2H4BrzeQ9ErqhwUI",
        "_score" : 1.0,
        "_source" : {
          "city" : [
            "hello",
            ""
          ]
        }
      },
      {
        "_index" : "arrayindex",
        "_type" : "_doc",
        "_id" : "5Q3q2H4BrzeQ9ErqOAXW",
        "_score" : 1.0,
        "_source" : {
          "city" : [
            "hello",
            "sagar"
          ]
        }
      }
    ]
}

执行上述查询后的示例输出:

{
"hits" : [
      {
        "_index" : "arrayindex",
        "_type" : "_doc",
        "_id" : "4g3P2H4BrzeQ9ErqJwUL",
        "_score" : 0.5619608,
        "_source" : {
          "city" : [
            "",
            ""
          ]
        }
      },
      {
        "_index" : "arrayindex",
        "_type" : "_doc",
        "_id" : "4w3P2H4BrzeQ9ErqXgWT",
        "_score" : 0.0,
        "_source" : {
          "city" : [ ]
        }
      }
    ]
}