如果搜索字符串比搜索字段长，则文档不匹配

Question

我有一个正在寻找的标题

标题在文档中存储为 "Police diaries : stefan zweig"

当我搜索时"Police" 我得到了结果。但是当我搜索警察时我没有得到结果。

这里是查询：

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "fields": [
              "title",
              omitted because irrelevance...
            ],
            "query": "Policeman",
            "fuzziness": "1.5",
            "prefix_length": "2"
          }
        }
      ],
      "must": {
        omitted because irrelevance...
      }
    }
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ]
}

这是映射

{
    "books": {
        "mappings": {
            "book": {
                "_all": {
                    "analyzer": "nGram_analyzer", 
                    "search_analyzer": "whitespace_analyzer"
                },
                "properties": {
                    "title": {
                        "type": "text",
                        "fields": {
                            "raw": {
                                "type": "keyword"
                            },
                            "sort": {
                                "type": "text",
                                "analyzer": "to order in another language, (creates a string with symbols)",
                                "fielddata": true
                            }
                        }
                    }
                }
            }
        }
    }
}

请注意，我有标题为"some title"的文档如果我搜索 "someone title".

我不明白为什么警察簿没有出现。

Answer 1

所以你的问题分为两部分。

您想在搜索 policeman 时搜索包含 police 的标题。
想知道为什么 some title 文档与 someone title 文档匹配，据此您希望第一个文档也匹配。

让我先解释一下 为什么第二个查询匹配，为什么第一个查询不匹配，然后会告诉你，如何让第一个查询工作。

您的包含 some title 的文档创建了以下标记，您可以使用 analyzer API 进行验证。

POST /_analyze

{
    "text": "some title",
    "analyzer" : "standard" --> default analyzer for text field
}

生成的令牌

{
    "tokens": [
        {
            "token": "some",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "title",
            "start_offset": 5,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

现在，当您使用 match query which is analyzed 搜索 someone title 并使用与 index time 字段相同的分析器时。

因此它创建了 2 个标记 someone 和 title 并且匹配查询匹配 title 标记，这就是它出现在您的搜索结果中的原因，您也可以使用 Explain API 来验证并查看内部细节如何匹配。

如何在搜索`policeman`时带上`police`标题

您需要使用 synonyms token filter，如下例所示。

索引定义

{
    "settings": {
        "analysis": {
            "analyzer": {
                "synonyms": {
                    "filter": [
                        "lowercase",
                        "synonym_filter"
                    ],
                    "tokenizer": "standard"
                }
            },
            "filter": {
                "synonym_filter": {
                    "type": "synonym",
                    "synonyms" : ["policeman => police"] --> note this
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "": {
                "type": "text",
                "analyzer": "synonyms"
            }
        }
    }
}

索引示例文档

{
  "dialog" : "police"
}

包含字词 `policeman`

的搜索查询

{
    "query": {
        "match" : {
            "dialog" : {
                "query" : "policeman"
            }
        }
    }
}

和搜索结果

 "hits": [
      {
        "_index": "so_syn",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "dialog": "police" --> note source has `police` only.
        }
      }
    ]

如果搜索字符串比搜索字段长，则文档不匹配

No match on document if the search string is longer than the search field

synonym

elasticsearch

elasticsearch-query

elasticsearch-analyzers

生成的令牌

如何在搜索`policeman`时带上`police`标题

索引定义

索引示例文档

包含字词 `policeman`

和搜索结果

如果搜索字符串比搜索字段长，则文档不匹配

No match on document if the search string is longer than the search field

synonym

elasticsearch

elasticsearch-query

elasticsearch-analyzers

生成的令牌

如何在搜索policeman时带上police标题

索引定义

索引示例文档

包含字词 policeman

和搜索结果

如何在搜索`policeman`时带上`police`标题

包含字词 `policeman`