使用两个文件时同义词规则无效
Invalid synonym rule when using two files
我有两个几千行的同义词文件,这里是导致问题的示例:
en_synonyms
文件:
cereal, semolina, wheat
fr_synonyms
文件:
ble, cereale, wheat
这是我得到的错误:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "failed to build synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: wheat analyzed to a token (cereal) with position increment != 1 (got: 0)"
}
}
},
"status": 400
}
我使用的映射:
PUT wheat_syn
{
"mappings": {
"wheat": {
"properties": {
"description": {
"type": "text",
"fields": {
"synonyms": {
"type": "text",
"analyzer": "syn_text"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"en_synonyms": {
"type": "synonym",
"tokenizer": "keyword",
"synonyms_path" : "analysis/en_synonyms.txt"
},
"fr_synonyms": {
"type": "synonym",
"tokenizer": "keyword",
"synonyms_path" : "analysis/fr_synonyms.txt"
}
},
"analyzer": {
"syn_text": {
"tokenizer": "standard",
"filter": ["lowercase", "en_synonyms", "fr_synonyms" ]
}
}
}
}
}
两个文件都包含术语 wheat
当我从其中一个文件中删除它时,索引创建成功。
我考虑合并这两个文件,所以结果是:
cereal, semolina, wheat, ble, cereale
但在我的情况下,我无法手动执行此操作,因为这会花费很多时间(我将寻找一种以编程方式执行此操作的方法,具体取决于此问题的答案)
找到一个简单的解决方案:
我没有使用两个文件,而是将 en_synonyms
和 fr_synonyms
的内容连接在一个文件中 all_synonyms
:
cereal, semolina, wheat
ble, cereale, wheat
然后用于映射。
我有两个几千行的同义词文件,这里是导致问题的示例:
en_synonyms
文件:
cereal, semolina, wheat
fr_synonyms
文件:
ble, cereale, wheat
这是我得到的错误:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "failed to build synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: wheat analyzed to a token (cereal) with position increment != 1 (got: 0)"
}
}
},
"status": 400
}
我使用的映射:
PUT wheat_syn
{
"mappings": {
"wheat": {
"properties": {
"description": {
"type": "text",
"fields": {
"synonyms": {
"type": "text",
"analyzer": "syn_text"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"en_synonyms": {
"type": "synonym",
"tokenizer": "keyword",
"synonyms_path" : "analysis/en_synonyms.txt"
},
"fr_synonyms": {
"type": "synonym",
"tokenizer": "keyword",
"synonyms_path" : "analysis/fr_synonyms.txt"
}
},
"analyzer": {
"syn_text": {
"tokenizer": "standard",
"filter": ["lowercase", "en_synonyms", "fr_synonyms" ]
}
}
}
}
}
两个文件都包含术语 wheat
当我从其中一个文件中删除它时,索引创建成功。
我考虑合并这两个文件,所以结果是:
cereal, semolina, wheat, ble, cereale
但在我的情况下,我无法手动执行此操作,因为这会花费很多时间(我将寻找一种以编程方式执行此操作的方法,具体取决于此问题的答案)
找到一个简单的解决方案:
我没有使用两个文件,而是将 en_synonyms
和 fr_synonyms
的内容连接在一个文件中 all_synonyms
:
cereal, semolina, wheat
ble, cereale, wheat
然后用于映射。