如何使用停用词elasticsearch

How to use stopword elasticsearch

我的服务器上有一个 Elasticsearch 1.5 运行,

具体来说,我 want/create 三个字段

1.name

2.description

3.nickname

我想在 Elasticsearch 上插入数据时为描述和昵称字段设置停用词,然后停用词自动删除不需要的停用词。我试了很多次都没用。

curl -X POST http://127.0.0.1:9200/tryoindex/ -d'
{
  "settings": {
    "analysis": {
      "filter": {
        "custom_english_stemmer": {
          "type": "stemmer",
          "name": "english"
        },
        "snowball": {
          "type" : "snowball",
          "language" : "English"
                }
      },
      "analyzer": {
        "custom_lowercase_stemmed": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "custom_english_stemmer",
            "snowball"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
    "_all" : {"enabled" : true},
      "properties": {
        "text": {
          "type": "string",
          "analyzer": "custom_lowercase_stemmed"
        }
      }
    }
  }
}'

curl -X POST "http://localhost:9200/tryoindex/nama/1" -d '{
  "text" : "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your"
}'

curl "http://localhost:9200/tryoindex/nama/_search?pretty=1" -d '{
"query": {
    "query_string": {
        "query": "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your",
        "fields": ["text"]
    }
  }
}'

您需要在分析器过滤器链中使用 stop token filter

将您的分析器部件更改为

"analyzer": {
    "custom_lowercase_stemmed": {
      "tokenizer": "standard",
      "filter": [
        "stop",
        "lowercase",
        "custom_english_stemmer",
        "snowball"
      ]
    }
  }

要验证更改,请使用

curl -XGET 'localhost:9200/tryoindex/_analyze?analyzer=custom_lowercase_stemmed' -d 'testing this is stopword testing'

并观察标记

{"tokens":[{"token":"test","start_offset":0,"end_offset":7,"type":"<ALPHANUM>","position":1},{"token":"stopword","start_offset":16,"end_offset":24,"type":"<ALPHANUM>","position":4},{"token":"test","start_offset":25,"end_offset":32,"type":"<ALPHANUM>","position":5}]}%

PS:如果您不想获得词干版本的测试,请移除词干过滤器。