Azure 搜索 - 其他停用词

Question

在 Azure 搜索中创建索引定义时，有没有办法为该索引添加额外的停用词。例如，如果您正在为街道名称编制索引，则可能会删除 Road、Close、Avenue 等

如果将字段设置为不可搜索，即整个内容都作为一个词进行索引，那么像 Birken Court Road 这样的内容会发生什么。被索引的术语是 Birken Court 吗？非常感谢

Answer 1

您可以使用 custom analyzer 定义一组额外的停用词。例如，

{
 "name":"myindex",
 "fields":[
    {
       "name":"id",
       "type":"Edm.String",
       "key":true,
       "searchable":false
    },
    {
       "name":"text",
       "type":"Edm.String",
       "searchable":true,
       "analyzer":"my_analyzer"
    }
 ],
 "analyzers":[
    {
       "name":"my_analyzer",
       "@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
       "tokenizer":"standard_v2",
       "tokenFilters":[
          "lowercase",
          "english_stopwords",
          "my_stopwords"
       ]
    }
 ],
 "tokenFilters":[
    {
       "name":"english_stopwords",
       "@odata.type":"#Microsoft.Azure.Search.StopwordsTokenFilter",
       "stopwordsList":"english"
    },
    {
       "name":"my_stopwords",
       "@odata.type":"#Microsoft.Azure.Search.StopwordsTokenFilter",
       "stopwords": ["road", "avenue"]
    }
 ]
}

在这个索引定义中，我在 text 字段上设置了一个自定义分析器，它使用了 standard 分词器，lowercase 标记过滤器和两个 stopwords 标记过滤器，一个用于标准英语停用词，一个用于额外的停用词集。您可以使用 Analyze API 测试自定义分析器的行为，例如：

要求：

{
   "text":"going up the road",
   "analyzer": "my_analyzer"
}

回复：

{
  "tokens": [
    {
      "token": "going",
      "startOffset": 0,
      "endOffset": 5,
      "position": 0
    },
    {
      "token": "up",
      "startOffset": 6,
      "endOffset": 8,
      "position": 1
    }
  ]
}

分析器不适用于不可搜索的字段，因此不会删除示例中的停用词。要了解有关查询和文档处理的更多信息，请参阅：How full text search works in Azure Search.

Azure 搜索 - 其他停用词

Azure Search - Additional Stop Words

azure-cognitive-search