将自定义分析器用于与 Hibernate Search 匹配的术语
Use a custom analyzer for the matching term with Hibernate Search
我有一个带有自定义分析器的字段。
@Analyzer(definition = "edgeNgram")
@Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
@Lob
String value;
这是我 class 上的分析仪。
@AnalyzerDef(name = "edgeNgram",
tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characters by their simpler counterpart (è => e, etc.)
@TokenFilterDef(factory = LowerCaseFilterFactory.class), // Lowercase all characters
@TokenFilterDef(
factory = EdgeNGramFilterFactory.class, // Generate prefix tokens
params = {
@org.hibernate.search.annotations.Parameter(name = "minGramSize", value = "4"),
@org.hibernate.search.annotations.Parameter(name = "maxGramSize", value = "10")
}
)
})
我在这里创建查询。
query = queryBuilder
.simpleQueryString()
.boostedTo(3f) // This whole query is boosted so exact matches will obtain a better score
.onFields("title.value", "keyword.values.value")
.boostedTo(2f)
.andField("description.values.value")
//.withAndAsDefaultOperator()
.matching(Arrays.stream(searchTerm.split(" ")).map(e -> e + "*").collect(Collectors.joining(" ")).toLowerCase())
.createQuery();
我不知道如何(也无法在 Hibernate Search 的文档中找到)为搜索项 searchTerm
设置分析器。基本上我开始手动拆分并在 Java 中将其设置为小写。但这似乎不对。
我想要的是将另一个分析器应用于我的查询词,例如:
@AnalyzerDef(name = "edgeNGram_query",
tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characeters by their simpler counterpart (è => e, etc.)
@TokenFilterDef(factory = LowerCaseFilterFactory.class) // Lowercase all characters
})
您知道如何为查询词设置自定义分析器吗?为什么默认情况下不应用它?如果我搜索“bouees”它有效,但如果我搜索“bouées”它无效。
谢谢!
解决方案:
我的问题是当我应该进行 keyword
查询时,我进行了 simpleQueryString
。 simpleQueryString
似乎没有 运行 搜索词上的分析器!然后我只需要跟随@yrodiere .overridesForField( "description.values.value", "edgeNGram_query" )
来使用正确的搜索词分析器。
在 Hibernate Search 5 中,您必须在创建查询构建器时调用 overridesForField
,以覆盖每个字段的分析器:
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Hospital.class)
.overridesForField( "title.value", "edgeNGram_query" )
.overridesForField( "keyword.values.value", "edgeNGram_query" )
.overridesForField( "description.values.value" )
.get();
// Then it's business as usual
Query query = queryBuilder
.simpleQueryString()
.boostedTo(3f) // This whole query is boosted so exact matches will obtain a better score
.onFields("title.value", "keyword.values.value")
.boostedTo(2f)
.andField("description.values.value")
//.withAndAsDefaultOperator()
.matching(searchTerm)
.createQuery();
另请参阅 的末尾,这可能是您最初获得代码的地方? :)
如果有一天您升级到 Hibernate Search 6(Beta 版,不同的 API),您会发现它要简单得多:在构建谓词时有一个 override the analyzer 选项。例如:
List<MyEntity> hits = searchSession.search( MyEntity.class )
.where( f -> f.simpleQueryString()
.fields( "title.value", "keyword.values.value" ).boost( 3f )
.fields( "description.values.value" )
.matching( searchTerm )
//.defaultOperator( BooleanOperator.AND )
.analyzer( "edgeNGram_query" ) ) // <= HERE
.fetchHits( 20 );
我有一个带有自定义分析器的字段。
@Analyzer(definition = "edgeNgram")
@Field(index = Index.YES, analyze = Analyze.YES, store = Store.YES)
@Lob
String value;
这是我 class 上的分析仪。
@AnalyzerDef(name = "edgeNgram",
tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characters by their simpler counterpart (è => e, etc.)
@TokenFilterDef(factory = LowerCaseFilterFactory.class), // Lowercase all characters
@TokenFilterDef(
factory = EdgeNGramFilterFactory.class, // Generate prefix tokens
params = {
@org.hibernate.search.annotations.Parameter(name = "minGramSize", value = "4"),
@org.hibernate.search.annotations.Parameter(name = "maxGramSize", value = "10")
}
)
})
我在这里创建查询。
query = queryBuilder
.simpleQueryString()
.boostedTo(3f) // This whole query is boosted so exact matches will obtain a better score
.onFields("title.value", "keyword.values.value")
.boostedTo(2f)
.andField("description.values.value")
//.withAndAsDefaultOperator()
.matching(Arrays.stream(searchTerm.split(" ")).map(e -> e + "*").collect(Collectors.joining(" ")).toLowerCase())
.createQuery();
我不知道如何(也无法在 Hibernate Search 的文档中找到)为搜索项 searchTerm
设置分析器。基本上我开始手动拆分并在 Java 中将其设置为小写。但这似乎不对。
我想要的是将另一个分析器应用于我的查询词,例如:
@AnalyzerDef(name = "edgeNGram_query",
tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characeters by their simpler counterpart (è => e, etc.)
@TokenFilterDef(factory = LowerCaseFilterFactory.class) // Lowercase all characters
})
您知道如何为查询词设置自定义分析器吗?为什么默认情况下不应用它?如果我搜索“bouees”它有效,但如果我搜索“bouées”它无效。
谢谢!
解决方案:
我的问题是当我应该进行 keyword
查询时,我进行了 simpleQueryString
。 simpleQueryString
似乎没有 运行 搜索词上的分析器!然后我只需要跟随@yrodiere .overridesForField( "description.values.value", "edgeNGram_query" )
来使用正确的搜索词分析器。
在 Hibernate Search 5 中,您必须在创建查询构建器时调用 overridesForField
,以覆盖每个字段的分析器:
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Hospital.class)
.overridesForField( "title.value", "edgeNGram_query" )
.overridesForField( "keyword.values.value", "edgeNGram_query" )
.overridesForField( "description.values.value" )
.get();
// Then it's business as usual
Query query = queryBuilder
.simpleQueryString()
.boostedTo(3f) // This whole query is boosted so exact matches will obtain a better score
.onFields("title.value", "keyword.values.value")
.boostedTo(2f)
.andField("description.values.value")
//.withAndAsDefaultOperator()
.matching(searchTerm)
.createQuery();
另请参阅
如果有一天您升级到 Hibernate Search 6(Beta 版,不同的 API),您会发现它要简单得多:在构建谓词时有一个 override the analyzer 选项。例如:
List<MyEntity> hits = searchSession.search( MyEntity.class )
.where( f -> f.simpleQueryString()
.fields( "title.value", "keyword.values.value" ).boost( 3f )
.fields( "description.values.value" )
.matching( searchTerm )
//.defaultOperator( BooleanOperator.AND )
.analyzer( "edgeNGram_query" ) ) // <= HERE
.fetchHits( 20 );