Elasticsearch 嵌套匹配不正确
Elasticsearch Nest Not Matching Properly
使用 Elasticsearch 和 Nest 2.x
基于一些(疯狂的)用户要求,我需要将所有可搜索字段复制到一个字段,将其小写,并忽略 spaces。当用户输入要搜索的内容时,我将其小写并删除 spaces 以用作搜索字符串。
举个例子:
"The quick brown fox"...在弹性搜索中我希望这是 "thequickbrownfox" 用于搜索目的。
以下搜索应与上述文档匹配:
- 那个
- 快手
- t
- 棕色
- nf
以下是我构建索引的方式:
var customerSearchIdxDesc = new CreateIndexDescriptor(Constants.ElasticSearch.CustomerSearchIndexName)
.Settings(f =>
f.Analysis(analysis => analysis
.Analyzers(analyzers => analyzers
.Custom(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram, a => a
.Filters("lowercase")
.Tokenizer(Constants.ElasticSearch.TokenizerNames.NoWhitespaceNGram)))
.Tokenizers(tokenizers => tokenizers
.NGram(Constants.ElasticSearch.TokenizerNames.NoWhitespaceNGram, t => t
.MinGram(1)
.MaxGram(500)
.TokenChars(TokenChar.Digit, TokenChar.Letter, TokenChar.Punctuation, TokenChar.Symbol)
)
)
)
)
.Mappings(ms => ms.Map<ServiceModel.DtoTypes.Customer.SearchResult>(m => m
.AutoMap()
.Properties(p => p
.String(n => n.Name(c => c.CustomerName).CopyTo(f =>
{
return new FieldsDescriptor<string>().Field("search");
}).Index(FieldIndexOption.Analyzed).Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.ContactName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.CustomerName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.City)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.StateAbbreviation)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.Country)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.PostalCode)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(Constants.ElasticSearch.CombinedSearchFieldName)
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
)
)
);
如您所见,我在分析器上使用了小写过滤器,并使用了 TokenChars,因此省略了白色space(好吧,就是这个想法,它不起作用)。
这是我用来搜索的内容:
var response = client.Search<DtoTypes.Customer.SearchResult>(s =>
s.From(0)
.Take(Constants.ElasticSearch.MaxResults)
.Query(q => q
.MatchPhrase(mp => mp
.Field(Constants.ElasticSearch.CombinedSearchFieldName)
.Query(query))));
问题来了:
- 白色space好像没有被遗漏(好像是唯一匹配的词)
- 部分匹配似乎只适用于后缀。例如。搜索 "aby" 不会匹配 "abyss",但 "yss" 会。
- 跨词搜索无效 "the quick"...搜索 "theq" 没有任何匹配项。
我相信这解决了我的问题...通过添加字符过滤器,将其添加到分析器然后使用 EdgeNgram 分词器...不知道这是否是最佳设置,但它似乎有效。
var customerSearchIdxDesc = new CreateIndexDescriptor(Constants.ElasticSearch.CustomerSearchIndexName)
.Settings(f =>
f.Analysis(analysis => analysis
.CharFilters(cf => cf
.PatternReplace(Constants.ElasticSearch.FilterNames.RemoveWhitespace, pr => pr
.Pattern(" ")
.Replacement(string.Empty)
)
)
.Analyzers(analyzers => analyzers
.Custom(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer, a => a
.Filters("lowercase")
.CharFilters(Constants.ElasticSearch.FilterNames.RemoveWhitespace)
.Tokenizer(Constants.ElasticSearch.TokenizerNames.DefaultTokenizer)
)
)
.Tokenizers(tokenizers => tokenizers
.EdgeNGram(Constants.ElasticSearch.TokenizerNames.DefaultTokenizer, t => t
.MinGram(1)
.MaxGram(500)
)
)
)
)
.Mappings(ms => ms.Map<ServiceModel.DtoTypes.Customer.SearchResult>(m => m
.AutoMap()
.Properties(p => p
.String(n => n.Name(c => c.CustomerName).CopyTo(f =>
{
return new FieldsDescriptor<string>().Field("search");
}).Index(FieldIndexOption.Analyzed).Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.ContactName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.CustomerName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.City)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.StateAbbreviation)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.Country)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.PostalCode)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(Constants.ElasticSearch.CombinedSearchFieldName)
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
)
)
);
使用 Elasticsearch 和 Nest 2.x
基于一些(疯狂的)用户要求,我需要将所有可搜索字段复制到一个字段,将其小写,并忽略 spaces。当用户输入要搜索的内容时,我将其小写并删除 spaces 以用作搜索字符串。
举个例子: "The quick brown fox"...在弹性搜索中我希望这是 "thequickbrownfox" 用于搜索目的。
以下搜索应与上述文档匹配:
- 那个
- 快手
- t
- 棕色
- nf
以下是我构建索引的方式:
var customerSearchIdxDesc = new CreateIndexDescriptor(Constants.ElasticSearch.CustomerSearchIndexName)
.Settings(f =>
f.Analysis(analysis => analysis
.Analyzers(analyzers => analyzers
.Custom(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram, a => a
.Filters("lowercase")
.Tokenizer(Constants.ElasticSearch.TokenizerNames.NoWhitespaceNGram)))
.Tokenizers(tokenizers => tokenizers
.NGram(Constants.ElasticSearch.TokenizerNames.NoWhitespaceNGram, t => t
.MinGram(1)
.MaxGram(500)
.TokenChars(TokenChar.Digit, TokenChar.Letter, TokenChar.Punctuation, TokenChar.Symbol)
)
)
)
)
.Mappings(ms => ms.Map<ServiceModel.DtoTypes.Customer.SearchResult>(m => m
.AutoMap()
.Properties(p => p
.String(n => n.Name(c => c.CustomerName).CopyTo(f =>
{
return new FieldsDescriptor<string>().Field("search");
}).Index(FieldIndexOption.Analyzed).Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.ContactName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.CustomerName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.City)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.StateAbbreviation)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.Country)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(c => c.PostalCode)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
.String(n => n.Name(Constants.ElasticSearch.CombinedSearchFieldName)
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.LowercaseNGram))
)
)
);
如您所见,我在分析器上使用了小写过滤器,并使用了 TokenChars,因此省略了白色space(好吧,就是这个想法,它不起作用)。
这是我用来搜索的内容:
var response = client.Search<DtoTypes.Customer.SearchResult>(s =>
s.From(0)
.Take(Constants.ElasticSearch.MaxResults)
.Query(q => q
.MatchPhrase(mp => mp
.Field(Constants.ElasticSearch.CombinedSearchFieldName)
.Query(query))));
问题来了:
- 白色space好像没有被遗漏(好像是唯一匹配的词)
- 部分匹配似乎只适用于后缀。例如。搜索 "aby" 不会匹配 "abyss",但 "yss" 会。
- 跨词搜索无效 "the quick"...搜索 "theq" 没有任何匹配项。
我相信这解决了我的问题...通过添加字符过滤器,将其添加到分析器然后使用 EdgeNgram 分词器...不知道这是否是最佳设置,但它似乎有效。
var customerSearchIdxDesc = new CreateIndexDescriptor(Constants.ElasticSearch.CustomerSearchIndexName)
.Settings(f =>
f.Analysis(analysis => analysis
.CharFilters(cf => cf
.PatternReplace(Constants.ElasticSearch.FilterNames.RemoveWhitespace, pr => pr
.Pattern(" ")
.Replacement(string.Empty)
)
)
.Analyzers(analyzers => analyzers
.Custom(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer, a => a
.Filters("lowercase")
.CharFilters(Constants.ElasticSearch.FilterNames.RemoveWhitespace)
.Tokenizer(Constants.ElasticSearch.TokenizerNames.DefaultTokenizer)
)
)
.Tokenizers(tokenizers => tokenizers
.EdgeNGram(Constants.ElasticSearch.TokenizerNames.DefaultTokenizer, t => t
.MinGram(1)
.MaxGram(500)
)
)
)
)
.Mappings(ms => ms.Map<ServiceModel.DtoTypes.Customer.SearchResult>(m => m
.AutoMap()
.Properties(p => p
.String(n => n.Name(c => c.CustomerName).CopyTo(f =>
{
return new FieldsDescriptor<string>().Field("search");
}).Index(FieldIndexOption.Analyzed).Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.ContactName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.CustomerName)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.City)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.StateAbbreviation)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.Country)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(c => c.PostalCode)
.CopyTo(fs => fs.Field(Constants.ElasticSearch.CombinedSearchFieldName))
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
.String(n => n.Name(Constants.ElasticSearch.CombinedSearchFieldName)
.Index(FieldIndexOption.Analyzed)
.Analyzer(Constants.ElasticSearch.AnalyzerNames.DefaultAnalyzer))
)
)
);