使用 NGram 自动完成 Sitecore 和 Lucene 搜索
Sitecore & Lucene search auto-complete using NGram
我正在尝试使用 Ngram 为内容搜索设置自动完成功能。
这是我的 lucene 索引:
<autocompleteSearchConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
<indexAllFields>false</indexAllFields>
<initializeOnAdd>true</initializeOnAdd>
<analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" />
<fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
<fieldNames hint="raw:AddFieldByFieldName">
<field
fieldName="page_title"
storageType="YES"
indexType="TOKENIZED"
vectorType="NO"
boost="1.5f"
nullValue="NULL"
emptyString="EMPTY"
type="System.String"
settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
<analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.NGramAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
</field>
</fieldNames>
</fieldMap>
<fields hint="raw:AddComputedIndexField">
<field fieldName="page_title" storageType="yes">Client.Website.Code.Search.AutoCompleteTitle, Client.Website</field>
</fields>
<fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/>
<indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/>
<indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/>
<documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType>
</autocompleteSearchConfiguration>
请注意,我正在使用 NgramAnalyzer(参考:Sitecore.ContentSearch.LuceneProvider.Analyzers)。
当我在 luke 中查看此索引时,我可以看到它显示了正确的数据。
但是,以下 iQueryable 不保留任何结果。
var index = ContentSearchManager.GetIndex("INDEX NAME GOES HERE");
using (var context = index.CreateSearchContext())
{
var query = context.GetQueryable<AutocompleteSearchResult>().Where(i => i.PageTitle == term)
var result = query.GetResults();
}
为什么不用“StartsWith”代替==?
参见 this 文章。
Sitecore provides an n-gram analyzer for Lucene.net (Sitecore.ContentSearch.LuceneProvider.Analyzers). If you use Solr, you can set this up in the Solr Schema.xml file.
You use the n-gram analyzer to create autocomplete functionality for search input. The analyzer breaks tokens up into unigrams, bigrams, trigrams, and so on. When a user types a word, the n-gram analyzer looks the word up in different positions, using the tokens that it generated.
You add support for autocomplete by adding a new field to the index and mapping this field to use the n-gram analyzer instead of the default. When you run the LINQ query to query that field, use the following code:
using (IProviderSearchContext context = Index.CreateSearchContext())
{
result = context.GetQueryable<SearchResultItem>().
.Where(i => i.Name.StartsWith(“some”))
.Take(20)
.ToList();
}
Sitecore provides an implementation that uses trigrams and a set of English stop words. If you have other requirements, you can build a new analyzer and change these settings.
使用 n-gram 在 Sitecore 中构建自动完成搜索与使用 Solr 相比,使用 Lucene 更难实现。主要有两个原因:
- 您需要创建自己的分析器,因为
Sitecore.ContentSearch.LuceneProvider.Analyzers.NGramAnalyzer
不是为自动完成目的而构建的。
- 您需要防止您的搜索查询被
NGramAnalyzer
分析。通过使用 Lucene 作为您的搜索提供程序,您没有一个很好的 schema.xml
可以根据需要配置索引和查询分析器的地方。这需要在代码中手动完成。
有关如何构建此类自动完成搜索的更多信息,请关注 this article。
我正在尝试使用 Ngram 为内容搜索设置自动完成功能。 这是我的 lucene 索引:
<autocompleteSearchConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
<indexAllFields>false</indexAllFields>
<initializeOnAdd>true</initializeOnAdd>
<analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" />
<fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
<fieldNames hint="raw:AddFieldByFieldName">
<field
fieldName="page_title"
storageType="YES"
indexType="TOKENIZED"
vectorType="NO"
boost="1.5f"
nullValue="NULL"
emptyString="EMPTY"
type="System.String"
settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
<analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.NGramAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
</field>
</fieldNames>
</fieldMap>
<fields hint="raw:AddComputedIndexField">
<field fieldName="page_title" storageType="yes">Client.Website.Code.Search.AutoCompleteTitle, Client.Website</field>
</fields>
<fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/>
<indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/>
<indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/>
<documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType>
</autocompleteSearchConfiguration>
请注意,我正在使用 NgramAnalyzer(参考:Sitecore.ContentSearch.LuceneProvider.Analyzers)。
当我在 luke 中查看此索引时,我可以看到它显示了正确的数据。 但是,以下 iQueryable 不保留任何结果。
var index = ContentSearchManager.GetIndex("INDEX NAME GOES HERE");
using (var context = index.CreateSearchContext())
{
var query = context.GetQueryable<AutocompleteSearchResult>().Where(i => i.PageTitle == term)
var result = query.GetResults();
}
为什么不用“StartsWith”代替==?
参见 this 文章。
Sitecore provides an n-gram analyzer for Lucene.net (Sitecore.ContentSearch.LuceneProvider.Analyzers). If you use Solr, you can set this up in the Solr Schema.xml file.
You use the n-gram analyzer to create autocomplete functionality for search input. The analyzer breaks tokens up into unigrams, bigrams, trigrams, and so on. When a user types a word, the n-gram analyzer looks the word up in different positions, using the tokens that it generated.
You add support for autocomplete by adding a new field to the index and mapping this field to use the n-gram analyzer instead of the default. When you run the LINQ query to query that field, use the following code:
using (IProviderSearchContext context = Index.CreateSearchContext())
{
result = context.GetQueryable<SearchResultItem>().
.Where(i => i.Name.StartsWith(“some”))
.Take(20)
.ToList();
}
Sitecore provides an implementation that uses trigrams and a set of English stop words. If you have other requirements, you can build a new analyzer and change these settings.
使用 n-gram 在 Sitecore 中构建自动完成搜索与使用 Solr 相比,使用 Lucene 更难实现。主要有两个原因:
- 您需要创建自己的分析器,因为
Sitecore.ContentSearch.LuceneProvider.Analyzers.NGramAnalyzer
不是为自动完成目的而构建的。 - 您需要防止您的搜索查询被
NGramAnalyzer
分析。通过使用 Lucene 作为您的搜索提供程序,您没有一个很好的schema.xml
可以根据需要配置索引和查询分析器的地方。这需要在代码中手动完成。
有关如何构建此类自动完成搜索的更多信息,请关注 this article。