如何使用 Lucene 将 LowerCase 应用于字符串
How to apply LowerCase to a String using Lucene
我开始使用 Apache Lucene 8.0。我想知道如何使用 Lucene 将我的 String text
变量转换为小写。我不太确定该怎么做,因为我找不到任何示例。我想要的是这样的:
public class DocumentLowercase {
private Analyzer analyzer;
public Analyzer DocAnalysis(Document d) {
analyzer = new StandardAnalyzer();
String text = d.text();
**Here convert String Text into lowercase**
** maybe using Lower Case Tokenizer? but how? **
return analyzer;
}
}
StandardAnalyzer 已将所有内容转换为小写!
在此处查看文档:http://lucene.apache.org/core/8_0_0/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html
他们说:
Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a
configurable list of stop words.
您还可以在源代码中看到 StandardAnalyzer 包括哪些组件:
@Override
protected TokenStreamComponents createComponents(final String fieldName) {
final StandardTokenizer src = new StandardTokenizer();
src.setMaxTokenLength(maxTokenLength);
TokenStream tok = new LowerCaseFilter(src);
tok = new StopFilter(tok, stopwords);
return new TokenStreamComponents(r -> {
src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);
src.setReader(r);
}, tok);
}
如果你想自定义你的分析器,你应该查看 CustomAnalyzer
。
我开始使用 Apache Lucene 8.0。我想知道如何使用 Lucene 将我的 String text
变量转换为小写。我不太确定该怎么做,因为我找不到任何示例。我想要的是这样的:
public class DocumentLowercase {
private Analyzer analyzer;
public Analyzer DocAnalysis(Document d) {
analyzer = new StandardAnalyzer();
String text = d.text();
**Here convert String Text into lowercase**
** maybe using Lower Case Tokenizer? but how? **
return analyzer;
}
}
StandardAnalyzer 已将所有内容转换为小写!
在此处查看文档:http://lucene.apache.org/core/8_0_0/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html
他们说:
Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.
您还可以在源代码中看到 StandardAnalyzer 包括哪些组件:
@Override
protected TokenStreamComponents createComponents(final String fieldName) {
final StandardTokenizer src = new StandardTokenizer();
src.setMaxTokenLength(maxTokenLength);
TokenStream tok = new LowerCaseFilter(src);
tok = new StopFilter(tok, stopwords);
return new TokenStreamComponents(r -> {
src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);
src.setReader(r);
}, tok);
}
如果你想自定义你的分析器,你应该查看 CustomAnalyzer
。