如果同义词是多词，Elasticsearch 同义词标记过滤器如何工作？

Question

如果同义词是多词表达式并且分词器是空格，有人可以解释一下同义词分词过滤器是如何工作的吗？例如。如果我有这个简单的映射

PUT /test_index
{
    "settings": {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "synonym" : {
                        "tokenizer" : "whitespace",
                        "filter" : ["synonym"]
                    }
                },
                "filter" : {
                    "synonym_graph" : {
                        "type" : "synonym",
                        "lenient": true,
                        "synonyms" : ["multi word, bar => baz"]
                    }
                }
            }
        }
    }
}

我不明白如果 whitespace tokenizer 将术语 multi word 分成 two words 怎么可能评估它 多和字。因此，据我了解，同义词过滤器永远不会将“multi word”作为在配置中查找同义词的一个术语。任何帮助表示赞赏。

Answer 1

答案可以在这部分找到

https://www.elastic.co/guide/en/elasticsearch/reference/7.6/token-graphs.html

和这个博客 post

http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html

Some token filters can add tokens that span multiple positions. These can include tokens for multi-word synonyms, such as using "atm" as a synonym for "automatic teller machine". However, only some token filters, known as graph token filters, accurately record the positionLength for multi-position tokens.

Indexing ignores the positionLength attribute and does not support token graphs containing multi-position tokens. However, queries, such as the match or match_phrase query, can use these graphs to generate multiple sub-queries from a single query string.

The following token filters can add tokens that span multiple positions but only record a default positionLength of 1:

- synonym
- word_delimiter

This means these filters will produce invalid token graphs for streams containing such tokens.

Avoid using invalid token graphs for search. Invalid graphs can cause unexpected search results.

如果同义词是多词，Elasticsearch 同义词标记过滤器如何工作？

Elasticsearch how synonym token filter works if synonym is multi-word?

filter

synonym

elasticsearch