Elastic Search - 应用适当的分析器以获得准确的结果

Elastic Search - Apply appropriate analyser to accurate result

我是 Elastic Search 的新手。我想应用满足以下搜索的任何分析器。 让我们举个例子。 假设我在文档中输入了以下文本

  1. 我正在走路
  2. 我步行去了艾哈迈达巴德
  3. 每天早上散步
  4. 阿尼尔晚上散步。
  5. 我正在招聘候选人
  6. 我聘请了候选人
  7. 我每天都在招聘候选人
  8. 他聘用候选人

现在当我用

搜索时
  1. 文本“步行” 结果应该是 [walking, walked, walk, walks]
  2. 文字“走过” 结果应该是 [walking, walked, walk, walks]
  3. 文本“步行” 结果应该是 [walking, walked, walk, walks]
  4. 文本“行走” 结果应该是 [walking, walked, walk, walks]

同样的结果也应该租用。

  1. 文本“招聘” 结果应该是 [hiring, hired, hire, hires]
  2. 文本“已雇用” 结果应该是 [hiring, hired, hire, hires]
  3. 文本“雇用” 结果应该是 [hiring, hired, hire, hires]
  4. 文本“雇用” 结果应该是 [hiring, hired, hire, hires]

谢谢,

您需要使用stemmer token filter

Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search.

For example, walking and walked can be stemmed to the same root word: walk. Once stemmed, an occurrence of either word would match the other in a search.

映射

PUT index36
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }, 
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [ "stemmer" ,"lowercase"]
        }
      }
    }
  }
}

分析

GET index36/_analyze
{
  "text": ["walking", "walked", "walk", "walks"],
  "analyzer": "my_analyzer"
}

结果

{
  "tokens" : [
    {
      "token" : "walk",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "walk",
      "start_offset" : 8,
      "end_offset" : 14,
      "type" : "word",
      "position" : 101
    },
    {
      "token" : "walk",
      "start_offset" : 15,
      "end_offset" : 19,
      "type" : "word",
      "position" : 202
    },
    {
      "token" : "walk",
      "start_offset" : 20,
      "end_offset" : 25,
      "type" : "word",
      "position" : 303
    }
  ]
}

这四个词都产生相同的标记“walk”。因此,这些词中的任何一个都会在搜索中与另一个匹配。

您要搜索的是语言分析器,请参阅文档here

一个单词分析器总是由一个单词分词器和一个单词过滤器组成,如下例所示。

PUT /english_example
{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_" 
        },
        "english_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["example"] 
        },
        "english_stemmer": {
          "type":       "stemmer",
          "language":   "english"
        },
        "english_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        }
      },
      "analyzer": {
        "rebuilt_english": {
          "tokenizer":  "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_keywords",
            "english_stemmer"
          ]
        }
      }
    }
  }
}

您现在可以像这样在索引映射中使用分析器:

{ mappings": {
        "myindex": {
            "properties": {
                "myField": {
                    "type": "keyword",
                    "analyzer": "rebuilt_english"
                }
            }
        }
    }
}

记得使用匹配查询才能查询全文。