同义词似乎不适用于通配符请求

Synonyms does not seems to work with a wildcard request

我无法在我的 ElasticSearch 上使用同义词,我已经尝试了多种方法但没有任何效果所以我的设置是这样的:

首先,我的 synonyms.txt 文件:

hello => world

其次,我的索引元数据:

"analysis": {
    "filter": {
        "ipSynonym": {
            "type": "synonym",
            "synonyms_path": "synonyms.txt"
        },
        "ipAsciiFolding": {
            "type": "asciifolding",
            "preserve_original": "true"
        },
        "NoTokenPattern": {
            "type": "pattern_capture",
            "preserve_original": "true",
            "patterns": [".*"]
        }
    },
    "char_filter": {
        "ipCharFilter": {
            "type": "mapping",
            "mappings": ["'=>-",
            "_=>-"]
        }
    },
    "analyzer": {
        "ipStrictAnalyzer": {
            "filter": ["lowercase",
            "trim",
            "ipSynonym"],
            "type": "custom",
            "tokenizer": "ipStrictTokenizer"
        },
        "varIdAnalyser": {
            "type": "custom",
            "filter": ["lowercase",
            "trim"],
            "tokenizer": "varIdTokenizer"
        },
        "pathAnalyzer": {
            "type": "custom",
            "filter": ["lowercase"],
            "tokenizer": "pathTokenizer"
        },
        "ipAnalyzer": {
            "filter": ["icu_normalizer",
            "icu_folding",
            "ipSynonym"],
            "char_filter": ["ipCharFilter"],
            "type": "custom",
            "tokenizer": "ipTokenizer"
        }
    },
    "tokenizer": {
        "varIdTokenizer": {
            "pattern": "([\W_]+|[a-zA-Z0-9]+|[\w]+)",
            "type": "pattern",
            "group": "0"
        },
        "ipTokenizer": {
            "type": "icu_tokenizer"
        },
        "pathTokenizer": {
            "type": "pattern",
            "pattern": "/"
        },
        "ipStrictTokenizer": {
            "type": "keyword"
        }
    }
}

正如您在那里看到的那样,我在 ElasticSearch 的配置文件夹中创建了一个名为 ipSynonym 的 'synonym' 过滤器,其中 synonym_path 到我新创建的 synonym.txt 文件。

你可以看到我在 ipStrictAnalyzer 和 ipAnalyzer 中使用了这个过滤器。

下面是我在 ElasticSearch API 上搜索时得到的结果: 首先请求:

http://localhost:9200/media/_analyze?analyzer=ipAnalyzer&text=hello/

答案:

{
    "tokens": [{
        "token": "world",
        "start_offset": 0,
        "end_offset": 5,
        "type": "SYNONYM",
        "position": 1
    }]
}

这让我觉得同义词过滤器工作正常,对吧? :)

所以我现在在 ElasticSearch 中执行此查询:

"query": {
    "nested": {
        "query": {
            "wildcard": {
                "name.analyzed": {
                    "value": "*world*"
                }
            }
        },
        "path": "name"
    }
}

输出的就是我想要的项目。这个:

{
    "_index": "media",
    "_type": "clipdocument",
    "_id": "2c215600-b21d-4355-a379-e44db5c9b354",
    "_score": 1,
    "_source": {
        "name": {
            "analyzed": "world",
            "notAnalyzed": "world"
        },
        "creationDate": "2015-02-27T23:27:58",
    }
}

现在我搜索

"query": {
    "nested": {
        "query": {
            "wildcard": {
                "name.analyzed": {
                    "value": "*hello*"
                }
            }
        },
        "path": "name"
    }
}

而且我没有找到我以前找到的文件,为什么? :(

所以,我觉得同义词系统很奇怪,但可能是因为我不习惯。

我从一个更简单的映射中重试,它成功了,但第一次(就像在例子中)我把 synonyms.txt 文件弄坏了,我写了 hello => world 但我想创建 world =>你好。所以它现在有点工作了。