一些多词同义词在嵌套字段的弹性搜索中不起作用

Some Multi word synonyms are not working in elasticsearch for nested fields

我试图在查询时使用同义词分析器,但没有得到预期的结果。有人可以对此有所了解吗?

这是我的索引映射:

{
  "jobs_user_profile_v2": {
    "mappings": {
      "profile": {
        "_all": {
          "enabled": false
        },
        "_ttl": {
          "enabled": true
        },
        "properties": {

          "rsa": {
            "type": "nested",
            "properties": {
              "answer": {
                "type": "string",
                "index_analyzer": "autocomplete",
                "search_analyzer": "synonym",
                "position_offset_gap": 100
              },
              "answerId": {
                "type": "long"
              },
              "answerOriginal": {
                "type": "string",
                "index": "not_analyzed"
              },
              "createdAt": {
                "type": "long"
              },
              "label": {
                "type": "string",
                "index": "not_analyzed"
              },
              "labelOriginal": {
                "type": "string",
                "index": "not_analyzed"
              },
              "question": {
                "type": "string",
                "index": "not_analyzed"
              },
              "questionId": {
                "type": "long"
              },
              "questionOriginal": {
                "type": "string"
              },
              "source": {
                "type": "integer"
              },
              "updatedAt": {
                "type": "long"
              }
            }
          }

        }
      }
    }
  }
}

要关注的字段是rsa.answer,也就是我要查询的字段

我的同义词映射:

Beautician,Stylist,Make up artist,Massage therapist,Therapist,Spa,Hair Dresser,Salon,Beauty Parlour,Parlor => Beautician
Carpenter,Wood Worker,Furniture Carpenter => Carpenter
Cashier,Store Manager,Store Incharge,Purchase Executive,Billing Executive,Billing Boy => Cashier
Content Writer,Writer,Translator,Writing,Copywriter,Content Creation,Script Writer,Freelance Writer,Freelance Content Writer => Content Writer

我的搜索查询:

http://{{domain}}/jobs_user_profile_v2/_search

{
  "query": {
      "nested":{
           "path": "rsa",
           "query":{
    "query_string": {
      "query": "hair dresser",
      "fields": ["answer"],
      "analyzer" :"synonym"



    }
    },
     "inner_hits": {
          "explain": true
      }

  }
  },
  "explain" : true,
  "sort" : [ {
    "_score" : { }
  } ]
}

它正在显示适当的 Beautician 和“收银员profiles for search query理发师and计费主管but not showing anything for木工 => 木匠` 案例。

我的分析器结果:

http://{{domain}}/jobs_user_profile_v2/_analyze?analyzer=synonym&text=hair dresser


{
  "tokens": [
    {
      "token": "beautician",
      "start_offset": 0,
      "end_offset": 12,
      "type": "SYNONYM",
      "position": 1
    }
  ]
}

wood worker case

http://{{domain}}/jobs_user_profile_v2/_analyze?analyzer=synonym&text=wood worker


{
  "tokens": [
    {
      "token": "carpenter",
      "start_offset": 0,
      "end_offset": 11,
      "type": "SYNONYM",
      "position": 1
    }
  ]
}

它在其他一些情况下也不起作用。

我的索引分析器设置:

 "analysis": {
          "filter": {
            "synonym": {
              "ignore_case": "true",
              "type": "synonym",
              "synonyms_path": "synonym.txt"
            },
            "autocomplete_filter": {
              "type": "edge_ngram",
              "min_gram": "3",
              "max_gram": "10"
            }
          },
          "analyzer": {
            "text_en_splitting_search": {
              "type": "custom",
              "filter": [
                "stop",
                "lowercase",
                "porter_stem",
                "word_delimiter"
              ],
              "tokenizer": "whitespace"
            },
            "synonym": {
              "filter": [
                "stop",
                "lowercase",
                "synonym"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "autocomplete": {
              "filter": [
                "lowercase",
                "autocomplete_filter"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "text_en_splitting": {
              "filter": [
                "lowercase",
                "porter_stem",
                "word_delimiter"
              ],
              "type": "custom",
              "tokenizer": "whitespace"
            },
            "text_general": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "edge_ngram_analyzer": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "edge_ngram_tokenizer"
            },
            "autocomplete_analyzer": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "whitespace"
            }
          },
          "tokenizer": {
            "edge_ngram_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "2",
              "type": "edgeNGram",
              "max_gram": "10"
            }
          }
        }

对于上述情况,一个 multi-match 比查询字符串更理想。 与查询字符串不同的是,Multi-Match 在分析之前不会标记查询词。因此,多词同义词可能无法按预期工作。

示例:

{
   "query": {
      "nested": {
         "path": "rsa",
         "query": {
            "multi_match": {
               "query": "wood worker",
               "fields": [
                  "rsa.answer"
               ],
               "type" : "cross_fields",
               "analyzer": "synonym"
            }
         }
      }
   }
}

如果出于某种原因您更喜欢查询字符串,那么您需要将整个查询用双引号引起来以确保它没有被标记化:

示例:

post test/_search
{
   "query": {
      "nested": {
         "path": "rsa",
         "query": {
            "query_string": {
               "query": "\"wood worker\"",
               "fields": [
                  "rsa.answer"
               ],
               "analyzer": "synonym"
            }
         }
      }
   }
}