使用 ElasticSearch 按特殊字符搜索

Question

我在 VB.NET 项目中使用 ElasticSearch。正常搜索工作正常，即通过任何单词。但是，现在根据要求，我还想按特殊字符搜索，即 ?。我正在使用 ? 作为常规搜索，但它无法正常工作。

代码：

client.CreateIndex(Function(d) d.Analysis(Function(z) z.Analyzers(Function(a) a.Add("nGram_analyzer", Get_nGram_analyzer()).
Add("whitespace_analyzer", Get_whitespace_analyzer()).
Add("autocmp", New Nest.CustomAnalyzer() With {.Tokenizer = "edgeNGram", .Filter = {"lowercase"}})).
Tokenizers(Function(t) t.Add("edgeNGram", New Nest.EdgeNGramTokenizer With {.MinGram = 1, .MaxGram = 20})).
TokenFilters(Function(t) t.Add("nGram_filter", Get_nGram_filter()))).
Index(Of view_Article).AddMapping(Of view_Article)(ArticleMapping)

Private Shared Function Get_nGram_filter() As NgramFilter

        Return New NgramFilter With {
            .MinGram = 1,
            .MaxGram = 20,
            .token_chars = New List(Of String) From {"letter", "digit", "punctuation", "symbol"}
        }
End Function

Private Shared Function Get_nGram_analyzer() As CustomAnalyzer
        Return New CustomAnalyzer() With {
            .Tokenizer = "whitespace",
            .Filter = New List(Of String)() From {"lowercase", "asciifolding", "nGram_filter"}
        }
End Function

Private Shared Function Get_whitespace_analyzer() As CustomAnalyzer
        Return New CustomAnalyzer() With {
            .Tokenizer = "whitespace",
            .Filter = New List(Of String)() From {"lowercase", "asciifolding"}
        }
End Function

搜索查询：

"query": {
    "query_string": {
      "query": "\?",
      "fields": [
        "title"
      ],
      "default_operator": "and",
      "analyze_wildcard": true
    }
  }

注：我要多方搜索。即关键字，关键字+特殊字符，或只是特殊字符。

Answer 1

根据与@jeeten 的讨论更改我的答案，@Nishant 给出的答案也可以，但存在以下功能性和非功能性问题：

功能问题：

搜索中只允许 ? 和 / 特殊字符，而使用它将允许搜索所有标点符号。

非功能性问题：

这会导致 3 个字段以不同的格式索引，这会增加磁盘上的索引大小，也会给内存带来更大的压力，因为 Elasticsearch 会缓存倒排索引以获得更好的搜索性能。
同样，搜索需要三个不同的字段都搜索，再次搜索更多字段会导致性能问题。
令牌在title字段的三个字段中重复。

我的解决方案

为了解决上述功能和非功能需求，我使用 [pattern_capture][1] token-filter 仅索引 ? 和 /，它还使用 "preserve_original": true, 来索引支持像 foo? 这样的搜索。

我也在索引 2 个字段并仅在两个字段上搜索以提高性能。

索引定义

{
    "settings": {
        "analysis": {
            "filter": {
                "splcharfilter": {
                    "type": "pattern_capture",
                    "preserve_original": true,
                    "patterns": [
                        "([?/])" --> extendable for future requirments.
                    ]
                }
            },
            "analyzer": {
                "splcharanalyzer": {
                    "tokenizer": "keyword",
                    "filter": [
                        "splcharfilter",
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "fields": {
                    "splchar": {
                        "type": "text",
                        "analyzer": "splcharanalyzer"
                    }
                }
            }
        }
    }
}

搜索查询

{
  "query": {
    "query_string": {
      "query": "\?", --> change this according to queries.
      "fields": ["title", "title.splchar"] --> noyte only 2 fields
    }
  }
}

搜索结果

"hits": [
            {
                "_index": "pattern-capture",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0341108,
                "_source": {
                    "title": "Are you ready to change the climate?"
                }
            },
            {
                "_index": "pattern-capture",
                "_type": "_doc",
                "_id": "4",
                "_score": 1.0341108,
                "_source": {
                    "title": "What are the effects of direct public transfers on social solidarity?"
                }
            }
        ]

P.S：- 没有提及所有搜索查询及其输出以使答案简短，但任何人都可以索引和更改搜索查询并且它按预期工作。

Answer 2

以下面为例来自聊天作为基础：

Some example titles: 

title: Climate: The case of Nigerian agriculture
title: Are you ready to change the climate?
title: A literature review with a particular focus on the school staff
title: What are the effects of direct public transfers on social solidarity?
title: Community-Led Practical and/or Social Support Interventions for Adults Living at Home.

If I search by only "?" then it should return the 2nd and 4th results.
If I search by "/" then it should return only last record.
Search by climate then 1st and 2nd results.
Search by climate? then 1st, 2nd, and 4th results.

该解决方案需要为以下情况创建分析器：

搜索特殊字符。我将这些视为标点符号，例如/、?等
搜索关键字和特殊字符。例如climate?
要搜索关键字。例如climate

对于 案例 1 我们将使用 pattern tokenizer 但我们将使用模式来提取特殊字符作为标记，而不是使用模式来分割，为此我们设置"group": 0 在定义分词器时。例如对于文本 xyz a/b pq?，生成的令牌将是 /、?

对于 案例 2，我们将创建自定义分析器，其中 filter 作为 lowercase（不区分大小写），tokenizer 作为 whitespace（保留带有关键字的特殊字符）。例如对于文本 How many?，生成的令牌将是 how、many?

对于案例 3，我们将使用 standard 分析器，这是默认分析器。

下一步是为 title 创建子字段。 title 将是 text 类型，默认情况下将具有 standard 分析器。此映射属性将有两个类型为 text 的子字段 withSplChar 和为 case 2 (ci_whitespace) 创建的分析器，splChars 类型 text 和为 案例 1 创建的分析器 (splchar)

现在让我们看看上面的操作：

PUT test
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "splchar": {
          "type": "pattern",
          "pattern": "\p{Punct}",
          "group": 0
        }
      },
      "analyzer": {
        "splchar": {
          "tokenizer": "splchar"
        },
        "ci_whitespace": {
          "type": "custom",
          "filter": [
            "lowercase"
          ],
          "tokenizer": "whitespace"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "withSplChar": {
            "type": "text",
            "analyzer": "ci_whitespace"
          },
          "splChars": {
            "type": "text",
            "analyzer": "splchar"
          }
        }
      }
    }
  }
}

现在让我们像上面的例子一样索引文档：

POST test/_bulk
{"index":{"_id":"1"}}
{"title":"Climate: The case of Nigerian agriculture"}
{"index":{"_id":"2"}}
{"title":"Are you ready to change the climate?"}
{"index":{"_id":"3"}}
{"title":"A literature review with a particular focus on the school staff"}
{"index":{"_id":"4"}}
{"title":"What are the effects of direct public transfers on social solidarity?"}
{"index":{"_id":"5"}}
{"title":"Community-Led Practical and/or Social Support Interventions for Adults Living at Home."}

搜索 `?`

"hits" : [
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 0.8025915,
    "_source" : {
      "title" : "Are you ready to change the climate?"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "4",
    "_score" : 0.8025915,
    "_source" : {
      "title" : "What are the effects of direct public transfers on social solidarity?"
    }
  }
]

结果：

   "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.8025915,
        "_source" : {
          "title" : "Are you ready to change the climate?"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.8025915,
        "_source" : {
          "title" : "What are the effects of direct public transfers on social solidarity?"
        }
      }
    ]

搜索 `climate`

POST test/_search
{
  "query": {
    "query_string": {
      "query": "climate",
      "fields": ["title", "title.withSplChar", "title.splChars"]
    }
  }
}

结果：

"hits" : [
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0341107,
    "_source" : {
      "title" : "Climate: The case of Nigerian agriculture"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 0.98455274,
    "_source" : {
      "title" : "Are you ready to change the climate?"
    }
  }
]

搜索 `climate?`

POST test/_search
{
  "query": {
    "query_string": {
      "query": "climate\?",
      "fields": ["title", "title.withSplChar", "title.splChars"]
    }
  }
}

结果：

"hits" : [
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 1.5366155,
    "_source" : {
      "title" : "Are you ready to change the climate?"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0341107,
    "_source" : {
      "title" : "Climate: The case of Nigerian agriculture"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "4",
    "_score" : 0.8025915,
    "_source" : {
      "title" : "What are the effects of direct public transfers on social solidarity?"
    }
  }
]

使用 ElasticSearch 按特殊字符搜索

Search by Special Character using ElasticSearch

vb.net

special-characters

query-string

elasticsearch

elasticsearch-query

我的解决方案

索引定义

搜索查询

搜索结果

搜索 `?`

搜索 `climate`

搜索 `climate?`

使用 ElasticSearch 按特殊字符搜索

Search by Special Character using ElasticSearch

vb.net

special-characters

query-string

elasticsearch

elasticsearch-query

我的解决方案

索引定义

搜索查询

搜索结果

搜索 ?

搜索 climate

搜索 climate?

搜索 `?`

搜索 `climate`

搜索 `climate?`