如何排除字段被 elasticsearch 6.1 搜索？

Question

我有一个包含多个字段的索引。我想根据所有字段中是否存在搜索字符串来过滤掉一个 - user_comments。我正在做的查询搜索是

{
    "from": offset,
    "size": limit,
    "_source": [
      "document_title"
    ],
    "query": {
      "function_score": {
        "query": {
          "bool": {
            "must":
            {
              "query_string": {
                "query": "#{query}"
              }
            }
          }
        }
      }
    }
  }

尽管查询字符串正在搜索所有字段，并在 user_comments 字段中为我提供了具有匹配字符串的文档。但是，我想针对遗漏 user_comments 字段的所有字段查询它。白名单是一个很大的列表，而且字段名是动态的，所以用fields参数来提白名单字段列表是不可行的。

"query_string": {
                    "query": "#{query}",
                    "fields": [
                      "document_title",
                      "field2"
                    ]
                  }

任何人都可以就如何从搜索中排除字段提出建议吗？

Answer 1

按照您的搜索方式，ES 将在 _all 字段中查找匹配项。要排除一个字段，您可以禁用 _all 用户评论字段。

参考- https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html#enabling-all-field

对于 ES 6.x 可以使用 copy_to

进行复制

https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html

Answer 2

有一种方法可以使它工作，它不是很漂亮，但可以完成工作。您可以使用 boost and multifield parameters of query_string, bool query to combine the scores and setting min_score:

来实现您的目标

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "#{query}",
            "type": "most_fields",
            "boost": 1
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "#{query}",
            "boost": -1
          }
        }
      ]
    }
  },
  "min_score": 0.00001
}

那么幕后发生了什么？

假设您有以下一组文档：

PUT my-query-string/doc/1
{
  "title": "Prodigy in Bristol",
  "text": "Prodigy in Bristol",
  "comments": "Prodigy in Bristol"
}
PUT my-query-string/doc/2
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham",
  "comments": "And also in Bristol"
}
PUT my-query-string/doc/3
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham and Bristol",
  "comments": "And also in Cardiff"
}
PUT my-query-string/doc/4
{
  "title": "Prodigy in Birmigham",
  "text": "Prodigy in Birmigham",
  "comments": "And also in Cardiff"
}

在您的搜索请求中，您只想查看文档 1 和 3，但您的原始查询将 return 1、2 和 3。

在Elasticsearch中，搜索结果按relevance _score排序，分值越大越好。

因此，让我们尝试 boost 向下 "comments" 字段，以便忽略它对相关性分数的影响。我们可以通过将两个查询与 should 组合起来并使用否定 boost:

来做到这一点

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Bristol"
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  }
}

这将为我们提供以下输出：

{
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "2",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham",
          "comments": "And also in Bristol"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      }
    ]
  }
}

文档 2 受到了惩罚，但文档 1 也受到了惩罚，尽管它是我们想要的匹配项。为什么会这样？

在这种情况下，Elasticsearch 计算 _score 的方式如下：

_score = max(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"

文件1匹配comments:"Bristol"部分，也恰好是最好的成绩。根据我们的公式，结果分数为 0。

我们实际上想做的是提升第一个子句（带有“所有”字段）更多如果更多字段匹配。

我们能否提升 `query_string` 匹配更多字段？

我们可以，multifield 模式下的 query_string 有一个 type 参数可以做到这一点。查询将如下所示：

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "type": "most_fields",
            "query": "Bristol"
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  }
}

这将为我们提供以下输出：

{
  "hits": {
    "total": 3,
    "max_score": 0.57536423,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0.57536423,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "2",
        "_score": 0,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham",
          "comments": "And also in Bristol"
        }
      }
    ]
  }
}

如您所见，不需要的文档 2 在底部并且得分为 0。这是这次得分的计算方式：

_score = sum(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"

因此在任何字段中匹配 "Bristol" 的文档都被选中。 comments:"Bristol" 的相关性分数被淘汰，只有匹配 title:"Bristol" 或 text:"Bristol" 的文档得到 _score > 0.

我们可以过滤掉那些分数不理想的结果吗？

是的，我们可以，使用 min_score:

POST my-query-string/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
            "query": "Bristol",
            "type": "most_fields",
            "boost": 1
          }
        },
        {
          "query_string": {
            "fields": [
              "comments"
            ],
            "query": "Bristol",
            "boost": -1
          }
        }
      ]
    }
  },
  "min_score": 0.00001
}

这将起作用（在我们的例子中），因为当且仅当 "Bristol" 仅与字段 "comments" 匹配并且不匹配任何其他字段时，文档的分数将为 0。

输出将是：

{
  "hits": {
    "total": 2,
    "max_score": 0.57536423,
    "hits": [
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "1",
        "_score": 0.57536423,
        "_source": {
          "title": "Prodigy in Bristol",
          "text": "Prodigy in Bristol",
          "comments": "Prodigy in Bristol",
          "discount_percent": 10
        }
      },
      {
        "_index": "my-query-string",
        "_type": "doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Prodigy in Birmigham",
          "text": "Prodigy in Birmigham and Bristol",
          "comments": "And also in Cardiff"
        }
      }
    ]
  }
}

可以用不同的方式完成吗？

当然可以。我实际上不建议进行 _score 调整，因为这是一件非常复杂的事情。

我建议获取现有映射并构建一个字段列表以预先针对运行查询，这将使代码更加简单明了。

答案中提出的原始解决方案（保留历史）

最初建议使用这种查询，其意图与上述解决方案完全相同：

POST my-query-string/doc/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": {
            "query_string": {
              "fields" : ["*", "comments^0"],
              "query": "#{query}"
            }
          }
        }
      }
    }
  },
  "min_score": 0.00001
}

唯一的问题是，如果索引包含任何数值，这部分：

"fields": ["*"]

由于无法将文本查询字符串应用于数字而引发错误。

如何排除字段被 elasticsearch 6.1 搜索？

How to exclude a field from getting searched by elasticsearch 6.1?

elasticsearch

elasticsearch-6

那么幕后发生了什么？

我们能否提升 `query_string` 匹配更多字段？

我们可以过滤掉那些分数不理想的结果吗？

可以用不同的方式完成吗？

答案中提出的原始解决方案（保留历史）

如何排除字段被 elasticsearch 6.1 搜索？

How to exclude a field from getting searched by elasticsearch 6.1?

elasticsearch

elasticsearch-6

那么幕后发生了什么？

我们能否提升 query_string 匹配更多字段？

我们可以过滤掉那些分数不理想的结果吗？

可以用不同的方式完成吗？

答案中提出的原始解决方案（保留历史）

我们能否提升 `query_string` 匹配更多字段？