在 elasticsearch 中使用 Email tokenizer
Using Email tokenizer in elasticsearch
尝试了 elasticsearch 文档和 google 中的一些示例,但没有任何帮助弄清楚..
我的示例数据只是几篇博文。我正在尝试查看所有带有电子邮件地址的帖子。当我使用 "email":"someone"
时,我看到所有与 someone
匹配的帖子,但是当我更改为使用 someone@gmail.com
时,没有任何显示!
"hits": [
{
"_index": "blog",
"_type": "post",
"_id": "2",
"_score": 1,
"_source": {
"user": "sreenath",
"email": "someone@gmail.com",
"postDate": "2011-12-12",
"body": "Trying to figure out this",
"title": "Elastic search testing"
}
}
]
当我使用 Get 查询时,如下所示,我看到所有匹配 someone@anything.com
的帖子。但我想改变这个
{ "term" : { "email" : "someone" }}
到 { "term" : { "email" : "someone@gmail.com" }}
GET blog/post/_search
{
"query" : {
"filtered" : {
"filter" : {
"and" : [
{ "term" :
{ "email" : "someone" }
}
]
}
}
}
}
我为以下内容执行了 curl -XPUT,但没有帮助
curl -XPUT localhost:9200/test/ -d '
{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^@]+)",
"(\p{L}+)",
"(\d+)",
"@(.+)"
]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "email", "lowercase", "unique" ]
}
}
}
}
}
'
您已经为电子邮件地址创建了一个自定义分析器,但您没有使用它。您需要在映射类型中声明 email
字段才能实际使用该分析器,如下所示。还要确保使用该分析器创建正确的索引,即 blog
而不是 test
change this
|
v
curl -XPUT localhost:9200/blog/ -d '{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^@]+)",
"(\p{L}+)",
"(\d+)",
"@(.+)"
]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "email", "lowercase", "unique" ]
}
}
}
},
"mappings": { <--- add this
"post": {
"properties": {
"email": {
"type": "string",
"analyzer": "email"
}
}
}
}
}
'
尝试了 elasticsearch 文档和 google 中的一些示例,但没有任何帮助弄清楚..
我的示例数据只是几篇博文。我正在尝试查看所有带有电子邮件地址的帖子。当我使用 "email":"someone"
时,我看到所有与 someone
匹配的帖子,但是当我更改为使用 someone@gmail.com
时,没有任何显示!
"hits": [
{
"_index": "blog",
"_type": "post",
"_id": "2",
"_score": 1,
"_source": {
"user": "sreenath",
"email": "someone@gmail.com",
"postDate": "2011-12-12",
"body": "Trying to figure out this",
"title": "Elastic search testing"
}
}
]
当我使用 Get 查询时,如下所示,我看到所有匹配 someone@anything.com
的帖子。但我想改变这个
{ "term" : { "email" : "someone" }}
到 { "term" : { "email" : "someone@gmail.com" }}
GET blog/post/_search
{
"query" : {
"filtered" : {
"filter" : {
"and" : [
{ "term" :
{ "email" : "someone" }
}
]
}
}
}
}
我为以下内容执行了 curl -XPUT,但没有帮助
curl -XPUT localhost:9200/test/ -d '
{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^@]+)",
"(\p{L}+)",
"(\d+)",
"@(.+)"
]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "email", "lowercase", "unique" ]
}
}
}
}
}
'
您已经为电子邮件地址创建了一个自定义分析器,但您没有使用它。您需要在映射类型中声明 email
字段才能实际使用该分析器,如下所示。还要确保使用该分析器创建正确的索引,即 blog
而不是 test
change this
|
v
curl -XPUT localhost:9200/blog/ -d '{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^@]+)",
"(\p{L}+)",
"(\d+)",
"@(.+)"
]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "email", "lowercase", "unique" ]
}
}
}
},
"mappings": { <--- add this
"post": {
"properties": {
"email": {
"type": "string",
"analyzer": "email"
}
}
}
}
}
'