使用 ElasticSearch 按特殊字符搜索

Search by Special Character using ElasticSearch

我在 VB.NET 项目中使用 ElasticSearch。正常搜索工作正常,即通过任何单词。但是,现在根据要求,我还想按特殊字符搜索,即 ?。我正在使用 ? 作为常规搜索,但它无法正常工作。

代码:

client.CreateIndex(Function(d) d.Analysis(Function(z) z.Analyzers(Function(a) a.Add("nGram_analyzer", Get_nGram_analyzer()).
Add("whitespace_analyzer", Get_whitespace_analyzer()).
Add("autocmp", New Nest.CustomAnalyzer() With {.Tokenizer = "edgeNGram", .Filter = {"lowercase"}})).
Tokenizers(Function(t) t.Add("edgeNGram", New Nest.EdgeNGramTokenizer With {.MinGram = 1, .MaxGram = 20})).
TokenFilters(Function(t) t.Add("nGram_filter", Get_nGram_filter()))).
Index(Of view_Article).AddMapping(Of view_Article)(ArticleMapping)

Private Shared Function Get_nGram_filter() As NgramFilter

        Return New NgramFilter With {
            .MinGram = 1,
            .MaxGram = 20,
            .token_chars = New List(Of String) From {"letter", "digit", "punctuation", "symbol"}
        }
End Function

Private Shared Function Get_nGram_analyzer() As CustomAnalyzer
        Return New CustomAnalyzer() With {
            .Tokenizer = "whitespace",
            .Filter = New List(Of String)() From {"lowercase", "asciifolding", "nGram_filter"}
        }
End Function

Private Shared Function Get_whitespace_analyzer() As CustomAnalyzer
        Return New CustomAnalyzer() With {
            .Tokenizer = "whitespace",
            .Filter = New List(Of String)() From {"lowercase", "asciifolding"}
        }
End Function

搜索查询:

"query": {
    "query_string": {
      "query": "\?",
      "fields": [
        "title"
      ],
      "default_operator": "and",
      "analyze_wildcard": true
    }
  }

注:我要多方搜索。即关键字,关键字+特殊字符,或只是特殊字符。

根据与@jeeten 的讨论更改我的答案,@Nishant 给出的答案也可以,但存在以下功能性和非功能性问题:

功能问题:

  1. 搜索中只允许 ?/ 特殊字符,而使用它将允许搜索所有标点符号。

非功能性问题:

  1. 这会导致 3 个字段以不同的格式索引,这会增加磁盘上的索引大小,也会给内存带来更大的压力,因为 Elasticsearch 会缓存倒排索引以获得更好的搜索性能。

  2. 同样,搜索需要三个不同的字段都搜索,再次搜索更多字段会导致性能问题。

  3. 令牌在title字段的三个字段中重复。

我的解决方案

为了解决上述功能和非功能需求,我使用 [pattern_capture][1] token-filter 仅索引 ?/,它还使用 "preserve_original": true, 来索引支持像 foo? 这样的搜索。

我也在索引 2 个字段并仅在两个字段上搜索以提高性能。

索引定义

{
    "settings": {
        "analysis": {
            "filter": {
                "splcharfilter": {
                    "type": "pattern_capture",
                    "preserve_original": true,
                    "patterns": [
                        "([?/])" --> extendable for future requirments.
                    ]
                }
            },
            "analyzer": {
                "splcharanalyzer": {
                    "tokenizer": "keyword",
                    "filter": [
                        "splcharfilter",
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "fields": {
                    "splchar": {
                        "type": "text",
                        "analyzer": "splcharanalyzer"
                    }
                }
            }
        }
    }
}

搜索查询

{
  "query": {
    "query_string": {
      "query": "\?", --> change this according to queries.
      "fields": ["title", "title.splchar"] --> noyte only 2 fields
    }
  }
}

搜索结果

"hits": [
            {
                "_index": "pattern-capture",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0341108,
                "_source": {
                    "title": "Are you ready to change the climate?"
                }
            },
            {
                "_index": "pattern-capture",
                "_type": "_doc",
                "_id": "4",
                "_score": 1.0341108,
                "_source": {
                    "title": "What are the effects of direct public transfers on social solidarity?"
                }
            }
        ]

P.S:- 没有提及所有搜索查询及其输出以使答案简短,但任何人都可以索引和更改搜索查询并且它按预期工作。

以下面为例来自聊天作为基础:

Some example titles: 

title: Climate: The case of Nigerian agriculture
title: Are you ready to change the climate?
title: A literature review with a particular focus on the school staff
title: What are the effects of direct public transfers on social solidarity?
title: Community-Led Practical and/or Social Support Interventions for Adults Living at Home.

If I search by only "?" then it should return the 2nd and 4th results.
If I search by "/" then it should return only last record.
Search by climate then 1st and 2nd results.
Search by climate? then 1st, 2nd, and 4th results.

该解决方案需要为以下情况创建分析器:

  1. 搜索特殊字符。我将这些视为标点符号,例如/?
  2. 搜索关键字和特殊字符。例如climate?
  3. 要搜索关键字。例如climate

对于 案例 1 我们将使用 pattern tokenizer 但我们将使用模式来提取特殊字符作为标记,而不是使用模式来分割,为此我们设置"group": 0 在定义分词器时。例如对于文本 xyz a/b pq?,生成的令牌将是 /?

对于 案例 2,我们将创建自定义分析器,其中 filter 作为 lowercase(不区分大小写),tokenizer 作为 whitespace(保留带有关键字的特殊字符)。 例如对于文本 How many?,生成的令牌将是 howmany?

对于案例 3,我们将使用 standard 分析器,这是默认分析器。

下一步是为 title 创建子字段。 title 将是 text 类型,默认情况下将具有 standard 分析器。此映射 属性 将有两个类型为 text 的子字段 withSplChar 和为 case 2 (ci_whitespace) 创建的分析器,splChars 类型 text 和为 案例 1 创建的分析器 (splchar)

现在让我们看看上面的操作:

PUT test
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "splchar": {
          "type": "pattern",
          "pattern": "\p{Punct}",
          "group": 0
        }
      },
      "analyzer": {
        "splchar": {
          "tokenizer": "splchar"
        },
        "ci_whitespace": {
          "type": "custom",
          "filter": [
            "lowercase"
          ],
          "tokenizer": "whitespace"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "withSplChar": {
            "type": "text",
            "analyzer": "ci_whitespace"
          },
          "splChars": {
            "type": "text",
            "analyzer": "splchar"
          }
        }
      }
    }
  }
}

现在让我们像上面的例子一样索引文档:

POST test/_bulk
{"index":{"_id":"1"}}
{"title":"Climate: The case of Nigerian agriculture"}
{"index":{"_id":"2"}}
{"title":"Are you ready to change the climate?"}
{"index":{"_id":"3"}}
{"title":"A literature review with a particular focus on the school staff"}
{"index":{"_id":"4"}}
{"title":"What are the effects of direct public transfers on social solidarity?"}
{"index":{"_id":"5"}}
{"title":"Community-Led Practical and/or Social Support Interventions for Adults Living at Home."}

搜索 ?

"hits" : [
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 0.8025915,
    "_source" : {
      "title" : "Are you ready to change the climate?"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "4",
    "_score" : 0.8025915,
    "_source" : {
      "title" : "What are the effects of direct public transfers on social solidarity?"
    }
  }
]

结果:

   "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.8025915,
        "_source" : {
          "title" : "Are you ready to change the climate?"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.8025915,
        "_source" : {
          "title" : "What are the effects of direct public transfers on social solidarity?"
        }
      }
    ]

搜索 climate

POST test/_search
{
  "query": {
    "query_string": {
      "query": "climate",
      "fields": ["title", "title.withSplChar", "title.splChars"]
    }
  }
}

结果:

"hits" : [
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0341107,
    "_source" : {
      "title" : "Climate: The case of Nigerian agriculture"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 0.98455274,
    "_source" : {
      "title" : "Are you ready to change the climate?"
    }
  }
]

搜索 climate?

POST test/_search
{
  "query": {
    "query_string": {
      "query": "climate\?",
      "fields": ["title", "title.withSplChar", "title.splChars"]
    }
  }
}

结果:

"hits" : [
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 1.5366155,
    "_source" : {
      "title" : "Are you ready to change the climate?"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0341107,
    "_source" : {
      "title" : "Climate: The case of Nigerian agriculture"
    }
  },
  {
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "4",
    "_score" : 0.8025915,
    "_source" : {
      "title" : "What are the effects of direct public transfers on social solidarity?"
    }
  }
]