WordDelimiterFilterFactory 如何通过带数字的标记进行搜索?
How WordDelimiterFilterFactory to search by token with digits?
我有以下配置:
@AnalyzerDef(name = "autocompleteNGramAnalyzer",
// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
// Normalize token text to lowercase, as the user is unlikely to
// care about casing when searching for matches
@TokenFilterDef(factory = WordDelimiterFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
@Parameter(name = "minGramSize", value = "2"),
@Parameter(name = "maxGramSize", value = "5") }) })
这几乎可以按预期工作,但在处理包含数字的单词时会出现问题。
例如:
通过ab
token lucene returns abcdefg
,但是如果我需要找
a1
并且有 a1b1c1d1
它没有 return 任何东西
如何更改此配置?
除非您有其他未提及的要求,否则您应该尝试删除 WordDelimiterFilterFactory
,或者至少正确配置它(特别是将 preserveOriginal
设置为 1
)确实需要它的一些功能。
默认情况下,我认为 WordDelimiterFilter
将 "a1b1c1d1"
变成类似 ["a", "1", "b", "1", "c", "1", "d", "1"]
的东西,我怀疑它在 "autocomplete" 字段中是否有用。
我有以下配置:
@AnalyzerDef(name = "autocompleteNGramAnalyzer",
// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
// Normalize token text to lowercase, as the user is unlikely to
// care about casing when searching for matches
@TokenFilterDef(factory = WordDelimiterFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
@Parameter(name = "minGramSize", value = "2"),
@Parameter(name = "maxGramSize", value = "5") }) })
这几乎可以按预期工作,但在处理包含数字的单词时会出现问题。
例如:
通过ab
token lucene returns abcdefg
,但是如果我需要找
a1
并且有 a1b1c1d1
它没有 return 任何东西
如何更改此配置?
除非您有其他未提及的要求,否则您应该尝试删除 WordDelimiterFilterFactory
,或者至少正确配置它(特别是将 preserveOriginal
设置为 1
)确实需要它的一些功能。
默认情况下,我认为 WordDelimiterFilter
将 "a1b1c1d1"
变成类似 ["a", "1", "b", "1", "c", "1", "d", "1"]
的东西,我怀疑它在 "autocomplete" 字段中是否有用。