在 elasticsearch 中使用 Email tokenizer

Using Email tokenizer in elasticsearch

尝试了 elasticsearch 文档和 google 中的一些示例,但没有任何帮助弄清楚..

我的示例数据只是几篇博文。我正在尝试查看所有带有电子邮件地址的帖子。当我使用 "email":"someone" 时,我看到所有与 someone 匹配的帖子,但是当我更改为使用 someone@gmail.com 时,没有任何显示!

    "hits": [
             {
                "_index": "blog",
                "_type": "post",
                "_id": "2",
                "_score": 1,
                "_source": {
                   "user": "sreenath",
                   "email": "someone@gmail.com",
                   "postDate": "2011-12-12",
                   "body": "Trying to figure out this",
                   "title": "Elastic search testing"
                }
             }
           ]

当我使用 Get 查询时,如下所示,我看到所有匹配 someone@anything.com 的帖子。但我想改变这个 { "term" : { "email" : "someone" }}{ "term" : { "email" : "someone@gmail.com" }}

GET blog/post/_search
{ 
 "query" : { 
   "filtered" : { 
     "filter" : { 
       "and" : [ 
         { "term" :
            { "email" : "someone" }
         }
       ] 
     } 
   } 
 } 
}

我为以下内容执行了 curl -XPUT,但没有帮助

curl -XPUT localhost:9200/test/  -d '
{
   "settings" : {
      "analysis" : {
         "filter" : {
            "email" : {
               "type" : "pattern_capture",
               "preserve_original" : 1,
               "patterns" : [
                  "([^@]+)",
                  "(\p{L}+)",
                  "(\d+)",
                  "@(.+)"
               ]
            }
         },
         "analyzer" : {
            "email" : {
               "tokenizer" : "uax_url_email",
               "filter" : [ "email", "lowercase",  "unique" ]
            }
         }
      }
   }
}
'

您已经为电子邮件地址创建了一个自定义分析器,但您没有使用它。您需要在映射类型中声明 email 字段才能实际使用该分析器,如下所示。还要确保使用该分析器创建正确的索引,即 blog 而不是 test

                       change this
                            |
                            v
curl -XPUT localhost:9200/blog/  -d '{
   "settings" : {
      "analysis" : {
         "filter" : {
            "email" : {
               "type" : "pattern_capture",
               "preserve_original" : 1,
               "patterns" : [
                  "([^@]+)",
                  "(\p{L}+)",
                  "(\d+)",
                  "@(.+)"
               ]
            }
         },
         "analyzer" : {
            "email" : {
               "tokenizer" : "uax_url_email",
               "filter" : [ "email", "lowercase",  "unique" ]
            }
         }
      }
   },
   "mappings": {              <--- add this
      "post": {
         "properties": {
            "email": {
               "type": "string",
               "analyzer": "email"
            }
         }
      }
   }
}
'