如何使用 nest c# 客户端在 elasticsearch 中进行不区分重音的搜索？

Question

我是elasticsearch新手

假设我们有这样一个 class：

public class A
{
    public string name;
}

我们有 2 个文档，其名称类似于 "Ayşe" 和 "Ayse".

现在，我希望能够存储带有重音的姓名，但是当我搜索时希望能够将不区分重音的查询结果作为区分重音的结果。

例如：当我搜索 "Ayse" 或 "Ayşe" 时，它应该 return "Ayşe" 和 "Ayse" 的存储方式（带重音）。

现在当我搜索 "Ayse" 时，它只有 returns "Ayse" 但我也想得到 "Ayşe" 作为结果。

当我查看 elasticsearch 文档时，我看到需要使用折叠属性来实现它。但是我不明白如何使用 Nest 属性/函数来实现。

顺便说一句，我现在正在使用 AutoMap 创建映射，如果可能的话，我希望能够继续使用它。

我现在已经搜索了 2 天的答案，但还没弄明白。

What/where 需要修改吗？你能给我提供代码示例吗？

谢谢。

编辑 1：

我想出了如何使用分析器创建属性的子字段，并通过基于术语的子字段查询获得结果。

现在，我知道我可以进行多字段搜索，但是 有没有办法在全文搜索中包含子字段？

谢谢。

Answer 1

您可以configure an analyzer to perform analysis on the text at index time, index this into a multi_field to use at query time, as well as keep the original source to return in the result. Based on what you have in your question, it sounds like you want a custom analyzer that uses the asciifolding token filter在索引和搜索时转换为 ASCII 字符。

给定以下文件

public class Document
{
    public int Id { get; set;}
    public string Name { get; set; }
}

可以在创建索引时设置自定义分析器；我们也可以同时指定映射

client.CreateIndex(documentsIndex, ci => ci
    .Settings(s => s
        .NumberOfShards(1)
        .NumberOfReplicas(0)
        .Analysis(analysis => analysis
            .TokenFilters(tokenfilters => tokenfilters
                .AsciiFolding("folding-preserve", ft => ft
                    .PreserveOriginal()
                )
            )
            .Analyzers(analyzers => analyzers
                .Custom("folding-analyzer", c => c
                    .Tokenizer("standard")
                    .Filters("standard", "folding-preserve")
                )
            )
        )
    )
    .Mappings(m => m
        .Map<Document>(mm => mm
            .AutoMap()
            .Properties(p => p
                .String(s => s
                    .Name(n => n.Name)
                    .Fields(f => f
                        .String(ss => ss
                            .Name("folding")
                            .Analyzer("folding-analyzer")
                        )
                    )
                    .NotAnalyzed()
                )
            )
        )
    )
);

在这里，我创建了一个只有一个分片且没有副本的索引（您可能想根据您的环境更改此设置），并创建了一个自定义分析器，folding-analyzer 将标准分词器与standard 令牌过滤器和一个 folding-preserve 令牌过滤器执行 ascii 折叠，除了折叠的令牌外还存储原始令牌（稍后详细说明为什么这可能有用）。

我还映射了 Document 类型，将 Name 属性映射为 multi_field，默认字段 not_analyzed（对聚合有用) 和 .folding 子字段 将使用 folding-analyzer 进行分析。原始源文档默认也会被Elasticsearch存储

现在让我们索引一些文档

client.Index<Document>(new Document { Id = 1, Name = "Ayse" });
client.Index<Document>(new Document { Id = 2, Name = "Ayşe" });

// refresh the index after indexing to ensure the documents just indexed are
// available to be searched
client.Refresh(documentsIndex);

最后，搜索 Ayşe

var response = client.Search<Document>(s => s
    .Query(q => q
        .QueryString(qs => qs
            .Fields(f => f
                .Field(c => c.Name.Suffix("folding"))
            )
            .Query("Ayşe")
        )
    )
);

产量

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.163388,
    "hits" : [ {
      "_index" : "documents",
      "_type" : "document",
      "_id" : "2",
      "_score" : 1.163388,
      "_source" : {
        "id" : 2,
        "name" : "Ayşe"
      }
    }, {
      "_index" : "documents",
      "_type" : "document",
      "_id" : "1",
      "_score" : 0.3038296,
      "_source" : {
        "id" : 1,
        "name" : "Ayse"
      }
    } ]
  }
}

这里要强调两点：

首先，_source 包含发送到 Elasticsearch 的原始文本，因此使用 response.Documents，您将获得原始名称，例如

string.Join(",", response.Documents.Select(d => d.Name));

会给你"Ayşe,Ayse"

其次，还记得我们在 asciifolding 标记过滤器中保留了原始标记吗？这样做意味着我们可以执行经过分析的查询以不敏感地匹配口音，但在评分时也会考虑口音敏感性；在上面的示例中，Ayşe 匹配 Ayşe 的分数高于 Ayşe 匹配 Ayşe 因为标记 Ayşe 和 Ayse 被前者索引，而只有 Ayse 为后者编制索引。当针对 Name 属性执行分析的查询时，将使用 folding-analyzer 分析查询并执行匹配搜索

Index time
----------

document 1 name: Ayse --analysis--> Ayse

document 2 name: Ayşe --analysis--> Ayşe, Ayse  


Query time
-----------

query_string query input: Ayşe --analysis--> Ayşe, Ayse

search for documents with tokens for name field matching Ayşe or Ayse

如何使用 nest c# 客户端在 elasticsearch 中进行不区分重音的搜索？

How to make an accent insensitive search in elasticsearch with nest c# client?

c#

elasticsearch

nest