如何忽略 Lucene 查询中的某些字符(Hibernate 搜索)
How to ignore some chars in Lucene Query (Hibernate Search)
我已将此实体编入索引
@Entity
@Indexed
public class MyBean {
@Id
private Long id;
@Field
private String foo;
@Field
private String bar;
@Field
private String baz;
}
对于此架构:
+----+-------------+-------------+-------------+
| id | foo | bar | baz |
+----+-------------+-------------+-------------+
| 11 | an example | ignore this | ignore this |
| 12 | ignore this | an e.x.a.m. | ignore this |
| 13 | not this | not this | not this |
+----+-------------+-------------+-------------+
我需要通过搜索 exam
来找到 11
和 12
。
我试过:
FullTextEntityManager fullTextEntityManager =
Search.getFullTextEntityManager(this.entityManager);
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(MyBean.class).get();
Query textQuery = queryBuilder.keyword()
.onFields("foo", "bar", "baz").matching("exam").createQuery();
fullTextEntityManager.createFullTextQuery(textQuery, MyBean.class).getResultList();
但这只找到实体 11
,我还需要 12
。这可能吗?
将带有 CATENATE_ALL
标志的 WordDelimiterFilter
添加到您的分析链中,可能是一个解决方案。
因此基于 StandardAnalyzer
的分析器实现如下所示:
public class StandardWithWordDelim extends StopwordAnalyzerBase{
public static final CharArraySet STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
public StandardWithWordDelim() {
}
@Override
protected TokenStreamComponents createComponents(final String fieldName) {
StandardTokenizer src = new StandardTokenizer();
src.setMaxTokenLength(255);
TokenStream filter = new StandardFilter(src);
filter = new LowerCaseFilter(filter);
filter = new StopFilter(filter, stopwords);
//I'm inclined to add it here, so the abbreviation "t.h.e." doesn't get whacked by the StopFilter.
filter = new WordDelimiterFilter(filter, WordDelimiterFilter.CATENATE_ALL, null);
return new TokenStreamComponents(src, filter);
}
}
您似乎没有使用标准分析器(也许是 NGrams?),但您应该能够在某处将其纳入您的分析。
我已将此实体编入索引
@Entity
@Indexed
public class MyBean {
@Id
private Long id;
@Field
private String foo;
@Field
private String bar;
@Field
private String baz;
}
对于此架构:
+----+-------------+-------------+-------------+
| id | foo | bar | baz |
+----+-------------+-------------+-------------+
| 11 | an example | ignore this | ignore this |
| 12 | ignore this | an e.x.a.m. | ignore this |
| 13 | not this | not this | not this |
+----+-------------+-------------+-------------+
我需要通过搜索 exam
来找到 11
和 12
。
我试过:
FullTextEntityManager fullTextEntityManager =
Search.getFullTextEntityManager(this.entityManager);
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(MyBean.class).get();
Query textQuery = queryBuilder.keyword()
.onFields("foo", "bar", "baz").matching("exam").createQuery();
fullTextEntityManager.createFullTextQuery(textQuery, MyBean.class).getResultList();
但这只找到实体 11
,我还需要 12
。这可能吗?
将带有 CATENATE_ALL
标志的 WordDelimiterFilter
添加到您的分析链中,可能是一个解决方案。
因此基于 StandardAnalyzer
的分析器实现如下所示:
public class StandardWithWordDelim extends StopwordAnalyzerBase{
public static final CharArraySet STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
public StandardWithWordDelim() {
}
@Override
protected TokenStreamComponents createComponents(final String fieldName) {
StandardTokenizer src = new StandardTokenizer();
src.setMaxTokenLength(255);
TokenStream filter = new StandardFilter(src);
filter = new LowerCaseFilter(filter);
filter = new StopFilter(filter, stopwords);
//I'm inclined to add it here, so the abbreviation "t.h.e." doesn't get whacked by the StopFilter.
filter = new WordDelimiterFilter(filter, WordDelimiterFilter.CATENATE_ALL, null);
return new TokenStreamComponents(src, filter);
}
}
您似乎没有使用标准分析器(也许是 NGrams?),但您应该能够在某处将其纳入您的分析。