使用 apache lucene 删除停用词时出现异常
Exception while using apache lucene for stop words removal
我正在使用以下代码从输入文本中删除停用词。 tokenStream.incrementToken()
运行时出现以下异常。
java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
代码:
public static String removeStopWords(String textFile) throws Exception {
CharArraySet stopWords = EnglishAnalyzer.getDefaultStopSet();
TokenStream tokenStream = new StandardTokenizer();
tokenStream = new StopFilter(tokenStream, stopWords);
StringBuilder sb = new StringBuilder();
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incrementToken()) {
String term = charTermAttribute.toString();
sb.append(term + " ");
}
return sb.toString();
}
如下所示实例化您的 TokenStream -
TokenStream tokenStream = new StandardAnalyzer().tokenStream("field",new StringReader(textFile));
我正在使用以下代码从输入文本中删除停用词。 tokenStream.incrementToken()
运行时出现以下异常。
java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
代码:
public static String removeStopWords(String textFile) throws Exception {
CharArraySet stopWords = EnglishAnalyzer.getDefaultStopSet();
TokenStream tokenStream = new StandardTokenizer();
tokenStream = new StopFilter(tokenStream, stopWords);
StringBuilder sb = new StringBuilder();
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incrementToken()) {
String term = charTermAttribute.toString();
sb.append(term + " ");
}
return sb.toString();
}
如下所示实例化您的 TokenStream -
TokenStream tokenStream = new StandardAnalyzer().tokenStream("field",new StringReader(textFile));