如何在 Elasticsearch 中搜索 & 号？

Question

在 Elasticsearch 中，如何搜索带符号的值？尝试过：

http://localhost:9200/my_index/_search?q=name:"procter \u0026 gamble"

Answer 1

有多种方法，但一种方法是在映射中将字符串声明为 not_analyzed（见下文），然后搜索已编入索引的确切值。

curl -XPUT localhost:9200/tests -d '{
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}'

现在我们索引一个示例文档：

curl -XPUT localhost:9200/tests/test/1 -d '{"name":"procter & gamble"}'

最后，您的搜索查询将 return 您期望的文档：

curl -XGET localhost:9200/tests/test/_search?q=name.raw:"procter %26 gamble"

UPDATE 这是另一种更复杂的方法，它使用 nGram tokenizer 来索引所有可能的长度为 2 到 20（任意选择）的你名字的标记。

curl -XPUT localhost:9200/tests -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "ngram_analyzer": {
          "tokenizer": "ngram_tokenizer",
          "filter": [
            "lowercase"
          ]
        }
      },
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 20
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "string",
          "index_analyzer": "ngram_analyzer",
          "search_analyzer": "keyword"
        }
      }
    }
  }
}'

然后你可以像以前一样搜索确切的名字

curl -XGET localhost:9200/tests/test/_search?q=name:"procter %26 gamble"

或仅通过您姓名中的某些标记

curl -XGET localhost:9200/tests/test/_search?q=name:procter
curl -XGET localhost:9200/tests/test/_search?q=name:"procter %26"
curl -XGET localhost:9200/tests/test/_search?q=name:gamble

Answer 2

我使用了一些不同的方法。使用特殊字符创建自定义模式分析器（我使用 &'-@）。

ES中原来的模式分析器是"\W+"

来自 ES 文档 https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html

The pattern analyzer uses a regular expression to split the text into terms. The regular expression should match the token separators not the tokens themselves. The regular expression defaults to \W+ (or all non-word characters).

因此，要添加我想要的特殊字符，我必须反转此正则表达式的逻辑。这是我的自定义分析器。

    "special_chars_analyzer": {
      "type": "pattern",
      "pattern": "[^\w&'-@]+",
      "lowercase": true
    }

如何在 Elasticsearch 中搜索 & 号？

How to search for an ampersand in Elasticsearch?

ampersand

elasticsearch