文本字段上的 ElasticSearch Analyzer

Question

这是我在 elasticSearch 上的字段：

"keywordName": {
        "type": "text",
        "analyzer": "custom_stop"
      }

这是我的分析仪：

"custom_stop": {
      "type":      "custom",
      "tokenizer": "standard",
      "filter": [
        "my_stop",
        "my_snow",
        "asciifolding"
      ]
    }

这是我的过滤器：

           "my_stop": {
              "type":       "stop",
              "stopwords":  "_french_"
          },
           "my_snow" : {
                "type" : "snowball",
                "language" : "French"
            }

这是我的文档我的索引（在我唯一的字段中：keywordName）：

"canne a peche"、"canne"、"canne a peche telescopique"、"iphone 8"、"iphone 8 case"、"iphone 8 cover"、"iphone 8 charger"、"iphone 8 new"

当我搜索 "canne" 时，它给了我 "canne" 文档，这正是我想要的：

GET ads/_search
{
   "query": {
    "match": {
      "keywordName": {
        "query": "canne",
        "operator":  "and"
      }
    }
  },
  "size": 1
}

当我搜索 "canne à pêche" 时，它会给我 "canne a peche"，这也可以。 "Cannes à Pêche" -> "canne a peche" -> OK.

相同

这是棘手的部分：当我搜索 "iphone 8" 时，它给我 "iphone 8 cover" 而不是 "iphone 8"。如果我改变大小，我设置 5（因为它 returns 包含 "iphone 8" 的 5 个结果）。我看到 "iphone 8" 是得分方面的第四个结果。首先是 "iphone 8 cover"，然后是 "iphone 8 case"，然后是 "iphone 8 new"，最后是 "iphone 8" ...

查询结果如下：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.4009607,
    "hits": [
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8 cover",
        "_score": 1.4009607,
        "_source": {
          "keywordName": "iphone 8 cover"
        }
      },
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8 case",
        "_score": 1.4009607,
        "_source": {
          "keywordName": "iphone 8 case"
        }
      },
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8 new",
        "_score": 0.70293105,
        "_source": {
          "keywordName": "iphone 8 new"
        }
      },
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8",
        "_score": 0.5804671,
        "_source": {
          "keywordName": "iphone 8"
        }
      },
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8 charge",
        "_score": 0.46705723,
        "_source": {
          "keywordName": "iphone 8 charge"
        }
      }
    ]
  }
}

我怎样才能保持关键字 "canne a peche"（重音、大写字母、复数术语）的灵活性，同时告诉他如果有完全匹配 ("iphone 8" = "iphone 8"), 给我确切的关键字名称 ?

Answer 1

匹配查询使用tf/idf算法。这意味着您将获得按频率排序的模糊搜索结果。如果您想在完全匹配的情况下获得结果，您应该在之前创建一个 query_string 案例，如果没有结果，请使用您的匹配查询。

Answer 2

我建议这样：

    "keywordName": {
      "type": "text",
      "analyzer": "custom_stop",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    }

以及查询：

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "keywordName": {
              "query": "iphone 8",
              "operator": "and"
            }
          }
        },
        {
          "term": {
            "keywordName.raw": {
              "value": "iphone 8"
            }
          }
        }
      ]
    }
  },
  "size": 10
}

文本字段上的 ElasticSearch Analyzer

ElasticSearch Analyzer on text field

similarity

analyzer

elasticsearch

elasticsearch-analyzers