弹性搜索聚合桶将电子邮件格式计数为两个不同的桶键。
Elastic search aggregations buckets counting email format as two different bucket key .
我将字段存储为 "user1@user.com " .
使用聚合json查询:
"aggregations": {
"email-terms": {
"terms": {
"field": "l_obj.email",
"size": 0,
"shard_size": 0,
"order": {
"_count": "desc"
}
}
}
}
I am getting response :
"buckets" : [
{
"key" : "user.com",
"doc_count" : 1
},
{
"key" : "user1",
"doc_count" : 1
}
而不是
"buckets" : [
{
"key" : "user1@user.com",
"doc_count" : 1
}
]
同样的问题仍然存在于字符串类型 likes : user1.user2.user.com ,我正在做术语聚合。
我在这里遗漏了什么吗?
您需要在映射的 "email"
字段上设置 "index": "not_analyzed"
。
如果我在没有指定分析器(或不使用分析器)的情况下设置玩具索引,将使用 standard analyzer,它将按空格和“@”等符号拆分。因此,使用此索引定义:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"email": {
"type": "string"
}
}
}
}
}
如果我添加一个文档:
PUT /test_index/doc/1
{
"email": "user1@user.com"
}
然后要求terms
聚合,我得到两个术语:
POST /test_index/_search?search_type=count
{
"aggregations": {
"email-terms": {
"terms": {
"field": "email"
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"email-terms": {
"buckets": [
{
"key": "user.com",
"doc_count": 1
},
{
"key": "user1",
"doc_count": 1
}
]
}
}
}
但是如果我在该字段中用 "index": "not_analyzed"
重建索引,并再次索引同一个文档:
DELETE /test_index
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"email": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
PUT /test_index/doc/1
{
"email": "user1@user.com"
}
和运行相同的术语聚合,我只得到那个电子邮件地址的一个术语:
POST /test_index/_search?search_type=count
{
"aggregations": {
"email-terms": {
"terms": {
"field": "email"
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"email-terms": {
"buckets": [
{
"key": "user1@user.com",
"doc_count": 1
}
]
}
}
}
这是我使用的代码,一共:
http://sense.qbox.io/gist/a73a28bf7450b637138b02a371fb15cabf344ab6
我们可以使用索引模板来预定义字段类型,http://www.elastic.co/guide/en/elasticsearch/reference/1.3/indices-templates.html
,例如:
使用rest client或者elastic search sense
PUT/POST http://escluster:port/_template
{
"testtemplate": {
"aliases": {},
"mappings": {
"test1": {
"_all": {
"enabled": false
},
"_source": {
"enabled": true
},
"properties": {
"email": {
"fielddata": {
"format": "doc_values"
},
"index": "not_analyzed",
"type": "string"
}...
我将字段存储为 "user1@user.com " .
使用聚合json查询:
"aggregations": {
"email-terms": {
"terms": {
"field": "l_obj.email",
"size": 0,
"shard_size": 0,
"order": {
"_count": "desc"
}
}
}
}
I am getting response :
"buckets" : [
{
"key" : "user.com",
"doc_count" : 1
},
{
"key" : "user1",
"doc_count" : 1
}
而不是
"buckets" : [
{
"key" : "user1@user.com",
"doc_count" : 1
}
]
同样的问题仍然存在于字符串类型 likes : user1.user2.user.com ,我正在做术语聚合。 我在这里遗漏了什么吗?
您需要在映射的 "email"
字段上设置 "index": "not_analyzed"
。
如果我在没有指定分析器(或不使用分析器)的情况下设置玩具索引,将使用 standard analyzer,它将按空格和“@”等符号拆分。因此,使用此索引定义:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"email": {
"type": "string"
}
}
}
}
}
如果我添加一个文档:
PUT /test_index/doc/1
{
"email": "user1@user.com"
}
然后要求terms
聚合,我得到两个术语:
POST /test_index/_search?search_type=count
{
"aggregations": {
"email-terms": {
"terms": {
"field": "email"
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"email-terms": {
"buckets": [
{
"key": "user.com",
"doc_count": 1
},
{
"key": "user1",
"doc_count": 1
}
]
}
}
}
但是如果我在该字段中用 "index": "not_analyzed"
重建索引,并再次索引同一个文档:
DELETE /test_index
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"email": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
PUT /test_index/doc/1
{
"email": "user1@user.com"
}
和运行相同的术语聚合,我只得到那个电子邮件地址的一个术语:
POST /test_index/_search?search_type=count
{
"aggregations": {
"email-terms": {
"terms": {
"field": "email"
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"email-terms": {
"buckets": [
{
"key": "user1@user.com",
"doc_count": 1
}
]
}
}
}
这是我使用的代码,一共:
http://sense.qbox.io/gist/a73a28bf7450b637138b02a371fb15cabf344ab6
我们可以使用索引模板来预定义字段类型,http://www.elastic.co/guide/en/elasticsearch/reference/1.3/indices-templates.html ,例如:
使用rest client或者elastic search sense
PUT/POST http://escluster:port/_template
{
"testtemplate": {
"aliases": {},
"mappings": {
"test1": {
"_all": {
"enabled": false
},
"_source": {
"enabled": true
},
"properties": {
"email": {
"fielddata": {
"format": "doc_values"
},
"index": "not_analyzed",
"type": "string"
}...