如何使用停用词elasticsearch
How to use stopword elasticsearch
我的服务器上有一个 Elasticsearch 1.5 运行,
具体来说,我 want/create 三个字段
1.name
2.description
3.nickname
我想在 Elasticsearch 上插入数据时为描述和昵称字段设置停用词,然后停用词自动删除不需要的停用词。我试了很多次都没用。
curl -X POST http://127.0.0.1:9200/tryoindex/ -d'
{
"settings": {
"analysis": {
"filter": {
"custom_english_stemmer": {
"type": "stemmer",
"name": "english"
},
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": {
"custom_lowercase_stemmed": {
"tokenizer": "standard",
"filter": [
"lowercase",
"custom_english_stemmer",
"snowball"
]
}
}
}
},
"mappings": {
"test": {
"_all" : {"enabled" : true},
"properties": {
"text": {
"type": "string",
"analyzer": "custom_lowercase_stemmed"
}
}
}
}
}'
curl -X POST "http://localhost:9200/tryoindex/nama/1" -d '{
"text" : "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your"
}'
curl "http://localhost:9200/tryoindex/nama/_search?pretty=1" -d '{
"query": {
"query_string": {
"query": "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your",
"fields": ["text"]
}
}
}'
您需要在分析器过滤器链中使用 stop token filter。
将您的分析器部件更改为
"analyzer": {
"custom_lowercase_stemmed": {
"tokenizer": "standard",
"filter": [
"stop",
"lowercase",
"custom_english_stemmer",
"snowball"
]
}
}
要验证更改,请使用
curl -XGET 'localhost:9200/tryoindex/_analyze?analyzer=custom_lowercase_stemmed' -d 'testing this is stopword testing'
并观察标记
{"tokens":[{"token":"test","start_offset":0,"end_offset":7,"type":"<ALPHANUM>","position":1},{"token":"stopword","start_offset":16,"end_offset":24,"type":"<ALPHANUM>","position":4},{"token":"test","start_offset":25,"end_offset":32,"type":"<ALPHANUM>","position":5}]}%
PS:如果您不想获得词干版本的测试,请移除词干过滤器。
我的服务器上有一个 Elasticsearch 1.5 运行,
具体来说,我 want/create 三个字段
1.name
2.description
3.nickname
我想在 Elasticsearch 上插入数据时为描述和昵称字段设置停用词,然后停用词自动删除不需要的停用词。我试了很多次都没用。
curl -X POST http://127.0.0.1:9200/tryoindex/ -d'
{
"settings": {
"analysis": {
"filter": {
"custom_english_stemmer": {
"type": "stemmer",
"name": "english"
},
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": {
"custom_lowercase_stemmed": {
"tokenizer": "standard",
"filter": [
"lowercase",
"custom_english_stemmer",
"snowball"
]
}
}
}
},
"mappings": {
"test": {
"_all" : {"enabled" : true},
"properties": {
"text": {
"type": "string",
"analyzer": "custom_lowercase_stemmed"
}
}
}
}
}'
curl -X POST "http://localhost:9200/tryoindex/nama/1" -d '{
"text" : "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your"
}'
curl "http://localhost:9200/tryoindex/nama/_search?pretty=1" -d '{
"query": {
"query_string": {
"query": "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your",
"fields": ["text"]
}
}
}'
您需要在分析器过滤器链中使用 stop token filter。
将您的分析器部件更改为
"analyzer": {
"custom_lowercase_stemmed": {
"tokenizer": "standard",
"filter": [
"stop",
"lowercase",
"custom_english_stemmer",
"snowball"
]
}
}
要验证更改,请使用
curl -XGET 'localhost:9200/tryoindex/_analyze?analyzer=custom_lowercase_stemmed' -d 'testing this is stopword testing'
并观察标记
{"tokens":[{"token":"test","start_offset":0,"end_offset":7,"type":"<ALPHANUM>","position":1},{"token":"stopword","start_offset":16,"end_offset":24,"type":"<ALPHANUM>","position":4},{"token":"test","start_offset":25,"end_offset":32,"type":"<ALPHANUM>","position":5}]}%
PS:如果您不想获得词干版本的测试,请移除词干过滤器。