创建 Lucene.net 自定义分析器

Creating a Lucene.net Custom Analyzer

我正在尝试在 Lucene.net 4.8 中创建一个自定义分析器 - 但是我 运行 遇到了一个我无法理解的错误。

我的分析器代码:

public class SynonymAnalyzer : Analyzer  
{

protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
{
    String base1 = "lawnmower";
    String syn1 = "lawn mower";
    String base2 = "spanner";
    String syn2 = "wrench";

    SynonymMap.Builder sb = new SynonymMap.Builder(true);
    sb.Add(new CharsRef(base1), new CharsRef(syn1), true);
    sb.Add(new CharsRef(base2), new CharsRef(syn2), true);
    SynonymMap smap = sb.Build();

    Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_48, reader);

    TokenStream result = new StandardTokenizer(Version.LUCENE_48, reader);
    result = new SynonymFilter(result, smap, true);
    return new TokenStreamComponents(tokenizer, result);
}
}

我建立索引的代码是:

var fordFiesta = new Document();
fordFiesta.Add(new StringField("Id", "1", Field.Store.YES));
fordFiesta.Add(new TextField("Make", "Ford", Field.Store.YES));
fordFiesta.Add(new TextField("Model", "Fiesta 1.0 Developing", Field.Store.YES));
fordFiesta.Add(new TextField("FullText", "lawnmower Ford 1.0 Fiesta Developing spanner", Field.Store.YES));

Lucene.Net.Store.Directory directory = FSDirectory.Open(new DirectoryInfo(Environment.CurrentDirectory + "\LuceneIndex"));

SynonymAnalyzer analyzer = new SynonymAnalyzer();

var config = new IndexWriterConfig(Version.LUCENE_48, analyzer);
var writer = new IndexWriter(directory, config);

writer.UpdateDocument(new Term("Id", "1"), fordFiesta);

writer.Flush(true, true);
writer.Commit();
writer.Dispose();

然而,当我 运行 我的代码时,它在 writer.UpdateDocument 行失败并出现以下错误:

TokenStream contract violation: Reset()/Dispose() call missing, Reset() called multiple times, or subclass does not call base.Reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

我不知道哪里错了?!

问题是您的 TokenStreamComponents 是使用与结果 TokenStream 中使用的不同的 Tokenizer 构建的。将其更改为此应该可以解决问题:

Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_48, reader);
TokenStream result = new SynonymFilter(tokenizer, smap, true);
return new TokenStreamComponents(tokenizer, result);