WordDelimiterFilterFactory 如何通过带数字的标记进行搜索?

How WordDelimiterFilterFactory to search by token with digits?

我有以下配置:

@AnalyzerDef(name = "autocompleteNGramAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
    // Normalize token text to lowercase, as the user is unlikely to
    // care about casing when searching for matches
    @TokenFilterDef(factory = WordDelimiterFilterFactory.class),

    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
    @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
        @Parameter(name = "minGramSize", value = "2"),
        @Parameter(name = "maxGramSize", value = "5") }) })

这几乎可以按预期工作,但在处理包含数字的单词时会出现问题。

例如:

通过ab token lucene returns abcdefg,但是如果我需要找 a1 并且有 a1b1c1d1 它没有 return 任何东西

如何更改此配置?

除非您有其他未提及的要求,否则您应该尝试删除 WordDelimiterFilterFactory,或者至少正确配置它(特别是将 preserveOriginal 设置为 1)确实需要它的一些功能。

默认情况下,我认为 WordDelimiterFilter"a1b1c1d1" 变成类似 ["a", "1", "b", "1", "c", "1", "d", "1"] 的东西,我怀疑它在 "autocomplete" 字段中是否有用。