在检查搜索中忽略特殊字符(titles)
Ignore Special Characters (tittles) in Examine search
使用 Umbraco v6,检查搜索(不是完整的 Lucene 查询)。这是一个Latin/South美国网站。我问过我的同事,他们如何为 search/URL 输入小标题(字母上的重音标记),他们都说他们没有,他们只是使用“regular” " 字符 (A-Z, a-z).
我知道如何在传递给 Examine 时去除字符串的特殊字符 OUT,但我需要相反的方法,如 Examine 从属性中删除特殊字符以匹配到查询。我有许多名称中带有标题的“节点”(这是我正在搜索的属性之一)。
我研究过的帖子:
- http://shazwazza.com/categories/Examine?p=2
- Ignore special characters in Examine
- https://groups.google.com/forum/#!topic/umbraco-dev/W6cWyPOc43Y
我已经尝试编写 luence 查询(或者我认为如此),但我没有获得任何成功。
// q is my query from QueryString
var searcher = ExamineManager.Instance.SearchProviderCollection["CustomSearchSearcher"];
//var query = searcher.CreateSearchCriteria().Field("nodeName", q).Or().Field("description", q).Compile();
//var searchResults = searcher.Search(query).OrderByDescending(x => x.Score).TakeWhile(x => x.Score > 0.05f);
var searchResults = searcher.Search(Global.RemoveSpecialCharacters(q), true).OrderByDescending(x => x.Score).TakeWhile(x => x.Score > 0.05f);
全球Class
public static string RemoveSpecialCharacters(string str)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
if ((str[i] >= '0' && str[i] <= '9')
|| (str[i] >= 'A' && str[i] <= 'z' || (str[i] == '.' || str[i] == '_'))
|| str[i] == 'á' || str[i] == 'é' || str[i] == 'í' || str[i] == 'ñ' || str[i] == 'ó' || str[i] == 'ú')
{
sb.Append(str[i]);
}
}
return sb.ToString();
}
如上所述,我需要从 Lucene 中删除特殊字符(标题),而不是传入的查询。
发件人:https://our.umbraco.org/documentation/reference/searching/examine/overview-explanation
我也读过“Analyzers”,但我以前从未与他们合作过,也不知道 get/install/add 到 VS 等哪个。这是更好的方法吗这个??
自定义分析器就是答案。
umbraco 论坛在这里回答了这个问题:https://our.umbraco.org/forum/developers/extending-umbraco/16396-Examine-and-accents-for-portuguese-language
制作一个去除所有特殊字符的分析器:
public class CIAIAnalyser : Analyzer
{
public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
{
StandardTokenizer tokenizer = new StandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, reader);
tokenizer.SetMaxTokenLength(255);
TokenStream stream = new StandardFilter(tokenizer);
stream = new LowerCaseFilter(stream);
return new ASCIIFoldingFilter(stream);
}
}
然后你对搜索输入做同样的事情。
public class CleanAccent
{
public static string RemoveDiacritics(string input)
{
// Indicates that a Unicode string is normalized using full canonical decomposition.
if (String.IsNullOrEmpty(input)) return input;
string inputInFormD = input.Normalize(NormalizationForm.FormD);
var sb = new StringBuilder();
for (int idx = 0; idx < inputInFormD.Length; idx++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(inputInFormD[idx]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(inputInFormD[idx]);
}
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
}
然后参考 ExamineSettings.config 中的分析器。
使用 Umbraco v6,检查搜索(不是完整的 Lucene 查询)。这是一个Latin/South美国网站。我问过我的同事,他们如何为 search/URL 输入小标题(字母上的重音标记),他们都说他们没有,他们只是使用“regular” " 字符 (A-Z, a-z).
我知道如何在传递给 Examine 时去除字符串的特殊字符 OUT,但我需要相反的方法,如 Examine 从属性中删除特殊字符以匹配到查询。我有许多名称中带有标题的“节点”(这是我正在搜索的属性之一)。
我研究过的帖子:
- http://shazwazza.com/categories/Examine?p=2
- Ignore special characters in Examine
- https://groups.google.com/forum/#!topic/umbraco-dev/W6cWyPOc43Y
我已经尝试编写 luence 查询(或者我认为如此),但我没有获得任何成功。
// q is my query from QueryString
var searcher = ExamineManager.Instance.SearchProviderCollection["CustomSearchSearcher"];
//var query = searcher.CreateSearchCriteria().Field("nodeName", q).Or().Field("description", q).Compile();
//var searchResults = searcher.Search(query).OrderByDescending(x => x.Score).TakeWhile(x => x.Score > 0.05f);
var searchResults = searcher.Search(Global.RemoveSpecialCharacters(q), true).OrderByDescending(x => x.Score).TakeWhile(x => x.Score > 0.05f);
全球Class
public static string RemoveSpecialCharacters(string str)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
if ((str[i] >= '0' && str[i] <= '9')
|| (str[i] >= 'A' && str[i] <= 'z' || (str[i] == '.' || str[i] == '_'))
|| str[i] == 'á' || str[i] == 'é' || str[i] == 'í' || str[i] == 'ñ' || str[i] == 'ó' || str[i] == 'ú')
{
sb.Append(str[i]);
}
}
return sb.ToString();
}
如上所述,我需要从 Lucene 中删除特殊字符(标题),而不是传入的查询。
发件人:https://our.umbraco.org/documentation/reference/searching/examine/overview-explanation
我也读过“Analyzers”,但我以前从未与他们合作过,也不知道 get/install/add 到 VS 等哪个。这是更好的方法吗这个??
自定义分析器就是答案。
umbraco 论坛在这里回答了这个问题:https://our.umbraco.org/forum/developers/extending-umbraco/16396-Examine-and-accents-for-portuguese-language
制作一个去除所有特殊字符的分析器:
public class CIAIAnalyser : Analyzer
{
public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
{
StandardTokenizer tokenizer = new StandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, reader);
tokenizer.SetMaxTokenLength(255);
TokenStream stream = new StandardFilter(tokenizer);
stream = new LowerCaseFilter(stream);
return new ASCIIFoldingFilter(stream);
}
}
然后你对搜索输入做同样的事情。
public class CleanAccent
{
public static string RemoveDiacritics(string input)
{
// Indicates that a Unicode string is normalized using full canonical decomposition.
if (String.IsNullOrEmpty(input)) return input;
string inputInFormD = input.Normalize(NormalizationForm.FormD);
var sb = new StringBuilder();
for (int idx = 0; idx < inputInFormD.Length; idx++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(inputInFormD[idx]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(inputInFormD[idx]);
}
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
}
然后参考 ExamineSettings.config 中的分析器。