ElasticSearch

Question

弹性搜索 1.6

我想为包含连字符的文本编制索引，例如 U-12、U-17、WU-12、t 恤...并能够使用 "Simple Query String" 查询来搜索他们。

数据样本（简体）：

{"title":"U-12 Soccer",
 "comment": "the t-shirts are dirty"}

由于已经有很多关于连字符的问题，我已经尝试了以下解决方案：

使用字符过滤器：ElasticSearch - Searching with hyphens in name。

所以我做了这个映射：

{
  "settings":{
    "analysis":{
      "char_filter":{
        "myHyphenRemoval":{
          "type":"mapping",
          "mappings":[
            "-=>"
          ]
        }
      },
      "analyzer":{
        "default":{
          "type":"custom",
          "char_filter":  [ "myHyphenRemoval" ],
          "tokenizer":"standard",
          "filter":[
            "standard",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings":{
    "test":{
      "properties":{
        "title":{
          "type":"string"
        },
        "comment":{
          "type":"string"
        }
      }
    }
  }
}

使用以下查询完成搜索：

{"_source":true,
  "query":{
    "simple_query_string":{
      "query":"<Text>",
      "default_operator":"AND"
    }
  }
}

什么有效：

"U-12", "U*", "t*", "ts*"
什么不起作用：

"U-*"、"u-1*"、"t-*"、"t-sh*"、...

所以似乎没有对搜索字符串执行 char 过滤器？我可以做些什么来完成这项工作？

Answer 1

答案很简单：

引自 Igor Motov：Configuring the standard tokenizer

By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:

{
  "_source":true,
  "query":{
    "simple_query_string":{
      "query":"u-1*",
      "analyze_wildcard":true,
      "default_operator":"AND"
    }
  }
}

Answer 2

来自 Igor Motov 的引述是真实的，您必须添加 "analyze_wildcard":true，以使其与正则表达式一起使用。但重要的是要注意连字符实际上标记了 "u-12" in "u" “12”，两个分开的词。

如果保留原始文件很重要，请不要使用 Mapping 字符过滤器。否则有点用。

假设您有 "m0-77"、"m1-77" 和 "m2-77"，如果您搜索 m*-77，您将获得零匹配。但是，您可以将“-”（连字符）替换为 AND 以连接两个分隔的单词，然后搜索 m* AND 77，这将为您提供正确的命中。

您可以在客户端进行。

在你的问题中你-*

{
  "query":{
    "simple_query_string":{
      "query":"u AND 1*",
      "analyze_wildcard":true
    }
  }
}

t-sh*

  {
      "query":{
        "simple_query_string":{
          "query":"t AND sh*",
          "analyze_wildcard":true
        }
      }
    }

Answer 3

如果有人仍在寻找解决此问题的简单方法，请在索引数据时将连字符替换为下划线 _。

例如，O-000022334 应索引为 O_000022334。

搜索时，在显示结果时再次将下划线替换回连字符。这样您就可以搜索 "O-000022334"，它会找到正确的匹配项。

ElasticSearch - 使用连字符搜索

ElasticSearch - Searching with hyphens

mapping

hyphen