使用 NGram 自动完成 Sitecore 和 Lucene 搜索

Sitecore & Lucene search auto-complete using NGram

我正在尝试使用 Ngram 为内容搜索设置自动完成功能。 这是我的 lucene 索引:

<autocompleteSearchConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
      <indexAllFields>false</indexAllFields>
      <initializeOnAdd>true</initializeOnAdd>
      <analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" />
      <fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
        <fieldNames hint="raw:AddFieldByFieldName">
          <field
            fieldName="page_title"
            storageType="YES"
            indexType="TOKENIZED"
            vectorType="NO"
            boost="1.5f"
            nullValue="NULL"
            emptyString="EMPTY"
            type="System.String"
            settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
            <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.NGramAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
          </field>
        </fieldNames>
      </fieldMap>
      <fields hint="raw:AddComputedIndexField">
        <field fieldName="page_title" storageType="yes">Client.Website.Code.Search.AutoCompleteTitle, Client.Website</field>
      </fields>
      <fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/>
      <indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/>
      <indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/>
      <documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType>
    </autocompleteSearchConfiguration>

请注意,我正在使用 NgramAnalyzer(参考:Sitecore.ContentSearch.LuceneProvider.Analyzers)。

当我在 luke 中查看此索引时,我可以看到它显示了正确的数据。 但是,以下 iQueryable 不保留任何结果。

var index = ContentSearchManager.GetIndex("INDEX NAME GOES HERE");
using (var context = index.CreateSearchContext())
{
var query = context.GetQueryable<AutocompleteSearchResult>().Where(i => i.PageTitle == term)
var result = query.GetResults();
}

为什么不用“StartsWith”代替==?

参见 this 文章。

Sitecore provides an n-gram analyzer for Lucene.net (Sitecore.ContentSearch.LuceneProvider.Analyzers). If you use Solr, you can set this up in the Solr Schema.xml file.

You use the n-gram analyzer to create autocomplete functionality for search input. The analyzer breaks tokens up into unigrams, bigrams, trigrams, and so on. When a user types a word, the n-gram analyzer looks the word up in different positions, using the tokens that it generated.

You add support for autocomplete by adding a new field to the index and mapping this field to use the n-gram analyzer instead of the default. When you run the LINQ query to query that field, use the following code:

using (IProviderSearchContext context = Index.CreateSearchContext())
        {
            result = context.GetQueryable<SearchResultItem>().
                .Where(i => i.Name.StartsWith(“some”))
                .Take(20)
                .ToList();
        }

Sitecore provides an implementation that uses trigrams and a set of English stop words. If you have other requirements, you can build a new analyzer and change these settings.

使用 n-gram 在 Sitecore 中构建自动完成搜索与使用 Solr 相比,使用 Lucene 更难实现。主要有两个原因:

  1. 您需要创建自己的分析器,因为 Sitecore.ContentSearch.LuceneProvider.Analyzers.NGramAnalyzer 不是为自动完成目的而构建的。
  2. 您需要防止您的搜索查询被 NGramAnalyzer 分析。通过使用 Lucene 作为您的搜索提供程序,您没有一个很好的 schema.xml 可以根据需要配置索引和查询分析器的地方。这需要在代码中手动完成。

有关如何构建此类自动完成搜索的更多信息,请关注 this article