Elasticsearch:查找包含不超过查询中术语的文档

Elasticsearch: find documents containing not more terms than in the query

如果我有证件:

1: { "name": "red yellow" }
2: { "name": "green yellow" }

我想用 "red brown yellow" 查询并获取文档 1。

我的意思是查询应该至少包含来自我的文档的术语,但可以包含更多。如果文档包含查询中没有的标记,则不应命中。

我该怎么做?反过来很容易...

首先,您必须将字段声明为 fielddata : true 才能在其上执行脚本:

PUT test
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fielddata": true
      }
    }
  }
}

然后,您可以使用查询脚本过滤结果:

POST test/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source": """
                boolean res = true;
                for (item in doc['name']) {
                   res = 'red brown yellow'.contains(item) && res;
                 }
                 return res;
              """,
            "lang": "painless"
          }
        }
      },
      "must": [
        {
          "match": {
            "name": "red brown yellow"
          }
        }
      ]
    }
  }
}

请注意,文本字段上的字段数据可能会花费很多,如果 fou 可以将此字段作为关键字索引到数组中,则效果会更好,如下所示:

1: { "name": ["red","yellow"] }
2: { "name": ["green", "yellow"] }

搜索请求可以完全一样

The match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The minimum number of optional should clauses to match can be set using the minimum_should_match parameter.

要了解更多关于匹配查询,您可以参考ES documentation

下面是name字段

的映射
{
"tests": {
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
        }
    }
}

}

现在,当您从以下查询中搜索 "red brown yellow"

POST tests/_search

{
"query": {
    "match": {
        "name": {
            "query": "red brown yellow",
            "minimum_should_match": "75%"
        }
    }
 }

}

你得到了你想要的结果:

    {
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.87546873,
    "hits": [
      {
        "_index": "tests",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.87546873,
        "_source": {
          "name": "red yellow"
        }
      }
    ]
  }
}

输出将不包括 green yellow 。这是因为第二个文档,只匹配了 1/3 的查询词,低于 75%