.NET ElasticSearch NEST - 用于部分匹配的多个字段的 NGram 分析器
.NET ElasticSearch NEST - NGram Analyzer on Multiple Fields for Partial Matches
我正在尝试使用 ElasticSearch 对使用 NGram 的多个字段进行部分匹配,但在我构建索引后匹配了 0 个结果。这对我来说不是很自然,我什至无法让 NGram 在一个领域工作。这对我来说是一个充满激情的项目,我真的希望新搜索能够用于部分单词匹配。我尝试使用模糊度,但它开始对不正确的匹配项评分过高。
索引创建:
var nGramFilters = new List<string> { "lowercase", "asciifolding", "nGram_filter" };
Client.Indices.Create(CurrentIndexName, c => c
.Settings(st => st
.Analysis(an => an //
.Analyzers(anz => anz
.Custom("ngram_analyzer", cc => cc
.Tokenizer("ngram_tokenizer")
.Filters(nGramFilters))
)
.Tokenizers(tz => tz
.NGram("ngram_tokenizer", td => td
.MinGram(2)
.MaxGram(20)
.TokenChars(
TokenChar.Letter,
TokenChar.Digit,
TokenChar.Punctuation,
TokenChar.Symbol
)
)
)
)
)
.Map<Package>(map => map
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Title)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.Summary)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PestControlledBy)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideControlsThesePests)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideInstructions)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideActiveIngredients)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticidesContainingThisActiveIngredient)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideSafeOn)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideNotSafeOn)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
)
)
);
查询:
var result = _client.Search<Package>(s => s
.From((form.Page - 1) * form.PageSize)
.Size(form.PageSize)
.Query(query => query
.MultiMatch(m => m
.Fields(f => f
.Field(p => p.Title.Suffix("ngram"), 1.5)
.Field(p => p.Summary.Suffix("ngram"), 1.1)
.Field(p => p.PestControlledBy.Suffix("ngram"), 1.0)
.Field(p => p.PesticideControlsThesePests.Suffix("ngram"), 1.0)
.Field(p => p.PesticideInstructions.Suffix("ngram"), 1.0)
.Field(p => p.PesticideActiveIngredients.Suffix("ngram"), 1.0)
.Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram"), 1.0)
.Field(p => p.PesticideSafeOn.Suffix("ngram"), 1.0)
.Field(p => p.PesticideNotSafeOn.Suffix("ngram"), 1.0)
)
.Operator(Operator.Or) //
.Query(form.Query)
)
)
.Highlight(h => h
.PreTags("<strong>")
.PostTags("</strong>")
.Encoder(HighlighterEncoder.Html) //https://github.com/elastic/elasticsearch-net/issues/3091
.Fields(fs => fs
.Field(f => f.Summary.Suffix("ngram")),
fs => fs
.Field(p => p.PestControlledBy.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideControlsThesePests.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideInstructions.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideActiveIngredients.Suffix("ngram")),
fs => fs
.Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideSafeOn.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideNotSafeOn.Suffix("ngram"))
.NumberOfFragments(10)
.FragmentSize(250)
)
)
);
我是否在正确的范围内?我尝试使用默认分析器,但我不匹配 "cat dandelion" 和 "cat's ear dandelion" 之类的东西。使用默认分析器......整个单词必须匹配,但我希望部分匹配工作以获得 "petal" 和 "petals" 之类的东西。向正确方向迈出的任何一步都将受到赞赏。我是 ElasticSearch 和 NEST 的新手,现在只用了一个星期左右。
client.Indices.Create
调用无效,有两个原因:
MinGram
和 MaxGram
之间的差异不能大于 1,因此出现此错误
Elasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: PUT /my_index1?pretty=true&error_trace=true. ServerError: Type: illegal_argument_exception Reason: "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [18]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
您可以阅读有关此错误的更多信息 here。
- 没有像
nGram_filter
这样的过滤器,您需要将此过滤器更改为ngram
我通过检查 elasticsearch (localhost:9200/YOUR_INDEX_NAME/_mapping) 中的索引映射发现了这些问题,我发现其中没有应用映射。第二步是查看 DebugInformation
必须从索引创建响应
中告诉我什么
var createIndexResponse = await client.Indices.CreateAsync("my_index1", ..);
createIndexResponse.DebugInformation
希望对您有所帮助。
我正在尝试使用 ElasticSearch 对使用 NGram 的多个字段进行部分匹配,但在我构建索引后匹配了 0 个结果。这对我来说不是很自然,我什至无法让 NGram 在一个领域工作。这对我来说是一个充满激情的项目,我真的希望新搜索能够用于部分单词匹配。我尝试使用模糊度,但它开始对不正确的匹配项评分过高。
索引创建:
var nGramFilters = new List<string> { "lowercase", "asciifolding", "nGram_filter" };
Client.Indices.Create(CurrentIndexName, c => c
.Settings(st => st
.Analysis(an => an //
.Analyzers(anz => anz
.Custom("ngram_analyzer", cc => cc
.Tokenizer("ngram_tokenizer")
.Filters(nGramFilters))
)
.Tokenizers(tz => tz
.NGram("ngram_tokenizer", td => td
.MinGram(2)
.MaxGram(20)
.TokenChars(
TokenChar.Letter,
TokenChar.Digit,
TokenChar.Punctuation,
TokenChar.Symbol
)
)
)
)
)
.Map<Package>(map => map
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Title)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.Summary)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PestControlledBy)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideControlsThesePests)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideInstructions)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideActiveIngredients)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticidesContainingThisActiveIngredient)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideSafeOn)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideNotSafeOn)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
)
)
);
查询:
var result = _client.Search<Package>(s => s
.From((form.Page - 1) * form.PageSize)
.Size(form.PageSize)
.Query(query => query
.MultiMatch(m => m
.Fields(f => f
.Field(p => p.Title.Suffix("ngram"), 1.5)
.Field(p => p.Summary.Suffix("ngram"), 1.1)
.Field(p => p.PestControlledBy.Suffix("ngram"), 1.0)
.Field(p => p.PesticideControlsThesePests.Suffix("ngram"), 1.0)
.Field(p => p.PesticideInstructions.Suffix("ngram"), 1.0)
.Field(p => p.PesticideActiveIngredients.Suffix("ngram"), 1.0)
.Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram"), 1.0)
.Field(p => p.PesticideSafeOn.Suffix("ngram"), 1.0)
.Field(p => p.PesticideNotSafeOn.Suffix("ngram"), 1.0)
)
.Operator(Operator.Or) //
.Query(form.Query)
)
)
.Highlight(h => h
.PreTags("<strong>")
.PostTags("</strong>")
.Encoder(HighlighterEncoder.Html) //https://github.com/elastic/elasticsearch-net/issues/3091
.Fields(fs => fs
.Field(f => f.Summary.Suffix("ngram")),
fs => fs
.Field(p => p.PestControlledBy.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideControlsThesePests.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideInstructions.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideActiveIngredients.Suffix("ngram")),
fs => fs
.Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideSafeOn.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideNotSafeOn.Suffix("ngram"))
.NumberOfFragments(10)
.FragmentSize(250)
)
)
);
我是否在正确的范围内?我尝试使用默认分析器,但我不匹配 "cat dandelion" 和 "cat's ear dandelion" 之类的东西。使用默认分析器......整个单词必须匹配,但我希望部分匹配工作以获得 "petal" 和 "petals" 之类的东西。向正确方向迈出的任何一步都将受到赞赏。我是 ElasticSearch 和 NEST 的新手,现在只用了一个星期左右。
client.Indices.Create
调用无效,有两个原因:
MinGram
和MaxGram
之间的差异不能大于 1,因此出现此错误
Elasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: PUT /my_index1?pretty=true&error_trace=true. ServerError: Type: illegal_argument_exception Reason: "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [18]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
您可以阅读有关此错误的更多信息 here。
- 没有像
nGram_filter
这样的过滤器,您需要将此过滤器更改为ngram
我通过检查 elasticsearch (localhost:9200/YOUR_INDEX_NAME/_mapping) 中的索引映射发现了这些问题,我发现其中没有应用映射。第二步是查看 DebugInformation
必须从索引创建响应
var createIndexResponse = await client.Indices.CreateAsync("my_index1", ..);
createIndexResponse.DebugInformation
希望对您有所帮助。