使用 EdgeNGram 使用 ElasticSearch 进行精确搜索
Exact search with ElasticSearch using EdgeNGram
我正在使用 elastic 来搜索希腊语和拉丁语 characters.My 主要问题是我不能做到准确 searches.I 我在索引时使用 edgeNgram 过滤器,但我想控制它的最小值和最大搜索时间根据我的词 length.For 例如,如果我输入 "titanox" 我将首先得到“ΤΙΤΑΝΙΟΥ”,其次是 "TITANOX"。这是我的索引创建:
var response = client.CreateIndex(index, s => s
.Settings(s1 => s1
.NumberOfShards(5)
.NumberOfReplicas(5)
.Analysis(a => a.TokenFilters(t => t.IcuTransform("greeklatin", it => it.Id("Greek-Latin; NFD; [:Nonspacing Mark:] Remove; NFC")//
.Direction(IcuTransformDirection.Forward)) //
.IcuTransform("latingreek", lg => lg.Id("Greek-Latin; NFD; [:Nonspacing Mark:] Remove; NFC")
.Direction(IcuTransformDirection.Reverse))
.EdgeNGram("greekedge", ed => ed.MaxGram(7)
.MinGram(1)
.Side(EdgeNGramSide.Front))
.Stop("greekstop", sw => sw.StopWords())
.Lowercase("greeklowercase", gl => gl.Language(Language.Greek.ToString()))
.KeywordMarker("greekkeywords", gk => gk.Keywords(""))
.Stemmer("greekstemmer", gs => gs.Language(Language.Greek.ToString())))
.Analyzers(a1 => a1
.Custom("greek", t => t.Tokenizer("standard")
.Filters("greekedge", "greekstop", "greeklowercase", "greekkeywords", "greekstemmer", "greeklatin")))))
.Mappings(m => m.Map(type, mt => mt.Properties(c => c.Text(c1 => c1.Name("id").Analyzer("greek"))
.Text(c2 => c2.Name("brand").Analyzer("greek"))
.Text(c3 => c3.Name("service").Analyzer("greek"))
.Text(c4 => c4.Name("servicegroupdesc").Analyzer("greek"))
.Text(c5 => c5.Name("servicecategorydesc).Analyzer("greek"))
.Text(c6 => c6.Name("partscategory").Analyzer("greek"))
.Text(c7 => c7.Name("partsid").Analyzer("greek"))
.Text(c8 => c8.Name("partsdesc").Analyzer("greek"))))));
这是我的搜索
var response = client.Search<Cars>(n => n
.Index(index)
.Type(type)
.Query(m => m.MultiMatch(q => q
.Analyzer(analyzername)
//.MinimumShouldMatch("100%")
.Query("*" + searchWord + "*")
.Fields(f=>f.Field(fieldsForSearchList[0]))
.Fuzziness(Fuzziness.EditDistance(0))))
.Size(searchSize)
.From(0)
.TrackScores(true)
);
一个可能有用的解决方案是向此查询添加一个新查询,以提高用户 type.This 可以实现更精确搜索的词。
我正在使用 elastic 来搜索希腊语和拉丁语 characters.My 主要问题是我不能做到准确 searches.I 我在索引时使用 edgeNgram 过滤器,但我想控制它的最小值和最大搜索时间根据我的词 length.For 例如,如果我输入 "titanox" 我将首先得到“ΤΙΤΑΝΙΟΥ”,其次是 "TITANOX"。这是我的索引创建:
var response = client.CreateIndex(index, s => s
.Settings(s1 => s1
.NumberOfShards(5)
.NumberOfReplicas(5)
.Analysis(a => a.TokenFilters(t => t.IcuTransform("greeklatin", it => it.Id("Greek-Latin; NFD; [:Nonspacing Mark:] Remove; NFC")//
.Direction(IcuTransformDirection.Forward)) //
.IcuTransform("latingreek", lg => lg.Id("Greek-Latin; NFD; [:Nonspacing Mark:] Remove; NFC")
.Direction(IcuTransformDirection.Reverse))
.EdgeNGram("greekedge", ed => ed.MaxGram(7)
.MinGram(1)
.Side(EdgeNGramSide.Front))
.Stop("greekstop", sw => sw.StopWords())
.Lowercase("greeklowercase", gl => gl.Language(Language.Greek.ToString()))
.KeywordMarker("greekkeywords", gk => gk.Keywords(""))
.Stemmer("greekstemmer", gs => gs.Language(Language.Greek.ToString())))
.Analyzers(a1 => a1
.Custom("greek", t => t.Tokenizer("standard")
.Filters("greekedge", "greekstop", "greeklowercase", "greekkeywords", "greekstemmer", "greeklatin")))))
.Mappings(m => m.Map(type, mt => mt.Properties(c => c.Text(c1 => c1.Name("id").Analyzer("greek"))
.Text(c2 => c2.Name("brand").Analyzer("greek"))
.Text(c3 => c3.Name("service").Analyzer("greek"))
.Text(c4 => c4.Name("servicegroupdesc").Analyzer("greek"))
.Text(c5 => c5.Name("servicecategorydesc).Analyzer("greek"))
.Text(c6 => c6.Name("partscategory").Analyzer("greek"))
.Text(c7 => c7.Name("partsid").Analyzer("greek"))
.Text(c8 => c8.Name("partsdesc").Analyzer("greek"))))));
这是我的搜索
var response = client.Search<Cars>(n => n
.Index(index)
.Type(type)
.Query(m => m.MultiMatch(q => q
.Analyzer(analyzername)
//.MinimumShouldMatch("100%")
.Query("*" + searchWord + "*")
.Fields(f=>f.Field(fieldsForSearchList[0]))
.Fuzziness(Fuzziness.EditDistance(0))))
.Size(searchSize)
.From(0)
.TrackScores(true)
);
一个可能有用的解决方案是向此查询添加一个新查询,以提高用户 type.This 可以实现更精确搜索的词。