同义词似乎不适用于通配符请求
Synonyms does not seems to work with a wildcard request
我无法在我的 ElasticSearch 上使用同义词,我已经尝试了多种方法但没有任何效果所以我的设置是这样的:
首先,我的 synonyms.txt 文件:
hello => world
其次,我的索引元数据:
"analysis": {
"filter": {
"ipSynonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"ipAsciiFolding": {
"type": "asciifolding",
"preserve_original": "true"
},
"NoTokenPattern": {
"type": "pattern_capture",
"preserve_original": "true",
"patterns": [".*"]
}
},
"char_filter": {
"ipCharFilter": {
"type": "mapping",
"mappings": ["'=>-",
"_=>-"]
}
},
"analyzer": {
"ipStrictAnalyzer": {
"filter": ["lowercase",
"trim",
"ipSynonym"],
"type": "custom",
"tokenizer": "ipStrictTokenizer"
},
"varIdAnalyser": {
"type": "custom",
"filter": ["lowercase",
"trim"],
"tokenizer": "varIdTokenizer"
},
"pathAnalyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "pathTokenizer"
},
"ipAnalyzer": {
"filter": ["icu_normalizer",
"icu_folding",
"ipSynonym"],
"char_filter": ["ipCharFilter"],
"type": "custom",
"tokenizer": "ipTokenizer"
}
},
"tokenizer": {
"varIdTokenizer": {
"pattern": "([\W_]+|[a-zA-Z0-9]+|[\w]+)",
"type": "pattern",
"group": "0"
},
"ipTokenizer": {
"type": "icu_tokenizer"
},
"pathTokenizer": {
"type": "pattern",
"pattern": "/"
},
"ipStrictTokenizer": {
"type": "keyword"
}
}
}
正如您在那里看到的那样,我在 ElasticSearch 的配置文件夹中创建了一个名为 ipSynonym 的 'synonym' 过滤器,其中 synonym_path 到我新创建的 synonym.txt 文件。
你可以看到我在 ipStrictAnalyzer 和 ipAnalyzer 中使用了这个过滤器。
下面是我在 ElasticSearch API 上搜索时得到的结果:
首先请求:
http://localhost:9200/media/_analyze?analyzer=ipAnalyzer&text=hello/
答案:
{
"tokens": [{
"token": "world",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 1
}]
}
这让我觉得同义词过滤器工作正常,对吧? :)
所以我现在在 ElasticSearch 中执行此查询:
"query": {
"nested": {
"query": {
"wildcard": {
"name.analyzed": {
"value": "*world*"
}
}
},
"path": "name"
}
}
输出的就是我想要的项目。这个:
{
"_index": "media",
"_type": "clipdocument",
"_id": "2c215600-b21d-4355-a379-e44db5c9b354",
"_score": 1,
"_source": {
"name": {
"analyzed": "world",
"notAnalyzed": "world"
},
"creationDate": "2015-02-27T23:27:58",
}
}
现在我搜索
"query": {
"nested": {
"query": {
"wildcard": {
"name.analyzed": {
"value": "*hello*"
}
}
},
"path": "name"
}
}
而且我没有找到我以前找到的文件,为什么? :(
所以,我觉得同义词系统很奇怪,但可能是因为我不习惯。
我从一个更简单的映射中重试,它成功了,但第一次(就像在例子中)我把 synonyms.txt 文件弄坏了,我写了 hello => world 但我想创建 world =>你好。所以它现在有点工作了。
我无法在我的 ElasticSearch 上使用同义词,我已经尝试了多种方法但没有任何效果所以我的设置是这样的:
首先,我的 synonyms.txt 文件:
hello => world
其次,我的索引元数据:
"analysis": {
"filter": {
"ipSynonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"ipAsciiFolding": {
"type": "asciifolding",
"preserve_original": "true"
},
"NoTokenPattern": {
"type": "pattern_capture",
"preserve_original": "true",
"patterns": [".*"]
}
},
"char_filter": {
"ipCharFilter": {
"type": "mapping",
"mappings": ["'=>-",
"_=>-"]
}
},
"analyzer": {
"ipStrictAnalyzer": {
"filter": ["lowercase",
"trim",
"ipSynonym"],
"type": "custom",
"tokenizer": "ipStrictTokenizer"
},
"varIdAnalyser": {
"type": "custom",
"filter": ["lowercase",
"trim"],
"tokenizer": "varIdTokenizer"
},
"pathAnalyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "pathTokenizer"
},
"ipAnalyzer": {
"filter": ["icu_normalizer",
"icu_folding",
"ipSynonym"],
"char_filter": ["ipCharFilter"],
"type": "custom",
"tokenizer": "ipTokenizer"
}
},
"tokenizer": {
"varIdTokenizer": {
"pattern": "([\W_]+|[a-zA-Z0-9]+|[\w]+)",
"type": "pattern",
"group": "0"
},
"ipTokenizer": {
"type": "icu_tokenizer"
},
"pathTokenizer": {
"type": "pattern",
"pattern": "/"
},
"ipStrictTokenizer": {
"type": "keyword"
}
}
}
正如您在那里看到的那样,我在 ElasticSearch 的配置文件夹中创建了一个名为 ipSynonym 的 'synonym' 过滤器,其中 synonym_path 到我新创建的 synonym.txt 文件。
你可以看到我在 ipStrictAnalyzer 和 ipAnalyzer 中使用了这个过滤器。
下面是我在 ElasticSearch API 上搜索时得到的结果: 首先请求:
http://localhost:9200/media/_analyze?analyzer=ipAnalyzer&text=hello/
答案:
{
"tokens": [{
"token": "world",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 1
}]
}
这让我觉得同义词过滤器工作正常,对吧? :)
所以我现在在 ElasticSearch 中执行此查询:
"query": {
"nested": {
"query": {
"wildcard": {
"name.analyzed": {
"value": "*world*"
}
}
},
"path": "name"
}
}
输出的就是我想要的项目。这个:
{
"_index": "media",
"_type": "clipdocument",
"_id": "2c215600-b21d-4355-a379-e44db5c9b354",
"_score": 1,
"_source": {
"name": {
"analyzed": "world",
"notAnalyzed": "world"
},
"creationDate": "2015-02-27T23:27:58",
}
}
现在我搜索
"query": {
"nested": {
"query": {
"wildcard": {
"name.analyzed": {
"value": "*hello*"
}
}
},
"path": "name"
}
}
而且我没有找到我以前找到的文件,为什么? :(
所以,我觉得同义词系统很奇怪,但可能是因为我不习惯。
我从一个更简单的映射中重试,它成功了,但第一次(就像在例子中)我把 synonyms.txt 文件弄坏了,我写了 hello => world 但我想创建 world =>你好。所以它现在有点工作了。