如何提高休眠搜索中整个术语的出现
How to boost occurence of an entire term in hibernate search
对于这样的定义
@AnalyzerDef(name = "standard", charFilters = {
@CharFilterDef(factory = HTMLStripCharFilterFactory.class) },
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name = "words", value = "/org/apache/lucene/analysis/snowball/english_stop.txt")}),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
@Parameter(name = "maxGramSize", value = "1"),
@Parameter(name = "maxGramSize", value = "15")})
}),
我有两个文件 5456
和
5459 In the Jungle
,搜索词如 5459 我想 return 第二个文档在结果中比第一个文档高。但是第二个的 fieldNorm
比第一个低。
当整个搜索字词出现在文档中而不是只出现部分时,我如何提升文档?
/**
* Disables effect of lengthNorm
*/
public class CustomSimilarity extends ClassicSimilarity
{
public float lengthNorm(FieldInvertState state) {
return state.getBoost();
}
}
在spring yaml 配置中
spring.jpa.properties.hibernate.search.default.similarity : com.example.search.CustomSimilarity
我会使用两个字段,一个没有 ngrams 和提升,另一个有 ngrams 但没有提升。
@AnalyzerDefs({
@AnalyzerDef(
name = "ngram",
charFilters = {
@CharFilterDef(factory = HTMLStripCharFilterFactory.class)
},
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name = "words", value = "/org/apache/lucene/analysis/snowball/english_stop.txt")}),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
@Parameter(name = "maxGramSize", value = "1"),
@Parameter(name = "maxGramSize", value = "15")
})
}
),
@AnalyzerDef(
name = "standard",
charFilters = {
@CharFilterDef(factory = HTMLStripCharFilterFactory.class)
},
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name = "words", value = "/org/apache/lucene/analysis/snowball/english_stop.txt")})
}
),
})
@Indexed
@Entity
public class MyEntity {
@Fields({
@Field("myfield_ngram", analyzer = @Analyzer(definition = "ngram")),
@Field("myfield_standard", analyzer = @Analyzer(definition = "standard"))
})
private String myField;
// ...
}
然后这样查询:
QueryBuilder qb = fullTextSession.getSearchFactory()
.buildQueryBuilder()
.forEntity( MyEntity.class )
.overridesForField( "myField_ngram", "standard" ) // Don't generate ngrams when querying, it serves no purpose
.get();
Query query = qb.keyword()
.onField( "myField_standard" ).boostedTo(2.0f)
.andField( "myField_ngram" )
.matching( "5459 In the Jungle" )
.createQuery();
免责声明:我没有测试此代码。
对于这样的定义
@AnalyzerDef(name = "standard", charFilters = {
@CharFilterDef(factory = HTMLStripCharFilterFactory.class) },
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name = "words", value = "/org/apache/lucene/analysis/snowball/english_stop.txt")}),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
@Parameter(name = "maxGramSize", value = "1"),
@Parameter(name = "maxGramSize", value = "15")})
}),
我有两个文件 5456
和
5459 In the Jungle
,搜索词如 5459 我想 return 第二个文档在结果中比第一个文档高。但是第二个的 fieldNorm
比第一个低。
当整个搜索字词出现在文档中而不是只出现部分时,我如何提升文档?
/**
* Disables effect of lengthNorm
*/
public class CustomSimilarity extends ClassicSimilarity
{
public float lengthNorm(FieldInvertState state) {
return state.getBoost();
}
}
在spring yaml 配置中
spring.jpa.properties.hibernate.search.default.similarity : com.example.search.CustomSimilarity
我会使用两个字段,一个没有 ngrams 和提升,另一个有 ngrams 但没有提升。
@AnalyzerDefs({
@AnalyzerDef(
name = "ngram",
charFilters = {
@CharFilterDef(factory = HTMLStripCharFilterFactory.class)
},
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name = "words", value = "/org/apache/lucene/analysis/snowball/english_stop.txt")}),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
@Parameter(name = "maxGramSize", value = "1"),
@Parameter(name = "maxGramSize", value = "15")
})
}
),
@AnalyzerDef(
name = "standard",
charFilters = {
@CharFilterDef(factory = HTMLStripCharFilterFactory.class)
},
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name = "words", value = "/org/apache/lucene/analysis/snowball/english_stop.txt")})
}
),
})
@Indexed
@Entity
public class MyEntity {
@Fields({
@Field("myfield_ngram", analyzer = @Analyzer(definition = "ngram")),
@Field("myfield_standard", analyzer = @Analyzer(definition = "standard"))
})
private String myField;
// ...
}
然后这样查询:
QueryBuilder qb = fullTextSession.getSearchFactory()
.buildQueryBuilder()
.forEntity( MyEntity.class )
.overridesForField( "myField_ngram", "standard" ) // Don't generate ngrams when querying, it serves no purpose
.get();
Query query = qb.keyword()
.onField( "myField_standard" ).boostedTo(2.0f)
.andField( "myField_ngram" )
.matching( "5459 In the Jungle" )
.createQuery();
免责声明:我没有测试此代码。