在 Azure 搜索中创建支持双方的 EdgeNGram 分析器

Question

为 Azure 搜索定义自定义分析器时，可以选择从 this 列表中定义令牌筛选器。我正在尝试支持搜索前缀和中缀。例如：如果一个字段包含名称：123 456，我希望可搜索词包含：

当使用似乎可以解决问题的 EdgeNGramTokenFilterV2 时，可以选择定义 "side" 属性，但仅支持 "front" 和 "back"，不是都。 "front"（默认）值生成此列表：

并返回生成：

我尝试使用两个标记两个 EdgeNGramTokenFilterV2s，但这会通过组合两个过滤器创建术语，例如：“2”或“5”：

1
12
123
23
3
4
45
456
56
6
2 // Unwanted
5 // Unwanted

我也试过使用 "reverse" 标记，但这会颠倒一切，结果仍然是错误的。

我只使用一个搜索字段 ("Name")，希望它保持这样。（想到了使用不同分析器使用名为 "name_reverse" 的不同字段的选项，但这非常低效，并且在将搜索引擎连接到数据源时会引起很多麻烦。

为了便于参考，这是当前的索引创建请求：

{
 "name": "testindexboth",  
 "fields": [
  {"name": "id", "type": "Edm.String", "key": true },
  {"name": "Name", "type": "Edm.String", "searchable": true, "analyzer": "myAnalyzer"}
 ],
 "myAnalyzer": [
  {
   "name": "myAnalyzer",
   "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
   "tokenizer": "standard_v2",
    "tokenFilters":["front_filter", "back_filter"]
  }],

    "tokenFilters":[
            {
               "name":"front_filter",
               "@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
               "maxGram":15,
               "side": "front"
            },
                        {
               "name":"back_filter",
               "@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
               "maxGram":15,
               "side": "back"
            }
        ]
}

是否可以选择将两者结合起来，而不让它们打乱结果？

Answer 1

使用两种不同的自定义分析器向您的索引添加两个字段：一种用于前缀，一种用于后缀。查询时，针对这两个字段进行查询。

在 Azure 搜索中创建支持双方的 EdgeNGram 分析器

Create an EdgeNGram analyzer supporting both sides in Azure Search

search

analyzer

azure-cognitive-search

elasticsearch-analyzers