Elasticsearch 查询以查找丢失的记录
Elasticsearch query to find missing records
我一直在尝试使用 DSL 寻找一种方法来查找一组文档中缺少的文档。在我的数据集中,我有:
Unique ID | Information
abc | Some data
abc | Special Information
abc | Some data
def | Some data
def | Special Information
def | Some data
ghi | Some data
ghi | Some data
我想设计一个查询,为我提供没有特殊信息
的文档集的 UniqueID
例如,对于上述数据集,结果将是 ghi
谢谢
有多个没有特殊信息的唯一 ID。从这里开始并根据需要进行调整:
设置
PUT special_info
{
"mappings": {
"properties": {
"unique_id": {
"type": "keyword"
},
"information": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
同步
POST _bulk
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"abc","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"abc","information":"Special Information"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"abc","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"def","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"def","information":"Special Information"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"def","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"ghi","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"ghi","information":"Some data"}
查询
GET special_info/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"information.keyword": {
"value": "Special Information"
}
}
}
]
}
},
"_source": "unique_id",
"aggs": {
"by_unique_ids": {
"terms": {
"field": "unique_id"
}
}
}
}
屈服
...
"aggregations" : {
"by_unique_ids" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "abc",
"doc_count" : 2
},
{
"key" : "def",
"doc_count" : 2
},
{
"key" : "ghi",
"doc_count" : 2
}
]
}
}
上面我是用聚合来解决的
我用过terms aggregation, filter aggregation and bucket selector aggregation
使用术语聚合创建 unique_id 的存储桶。获取具有特殊信息的术语下的文档数。如果 count==0 那么 return bucket.
查询:
{
"size": 0,
"aggs": {
"unique_id": {
"terms": {
"field": "unique_id",
"size": 10
},
"aggs": {
"filter_special_infor": {
"filter": {
"term": {
"information.keyword": "Special Information"
}
},
"aggs": {
"filtered_count": {
"value_count": {
"field": "unique_id"
}
}
}
},
"doc_with_no_special_infor": {
"bucket_selector": {
"buckets_path": {
"filteredCount": "filter_special_infor>filtered_count"
},
"script": "if(params.filteredCount==0){return true;}else{return false;}"
}
}
}
}
}
}
结果:
"unique_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "ghi",
"doc_count" : 2,
"filter_special_infor" : {
"doc_count" : 0,
"filtered_count" : {
"value" : 0
}
}
}
]
}
我一直在尝试使用 DSL 寻找一种方法来查找一组文档中缺少的文档。在我的数据集中,我有:
Unique ID | Information
abc | Some data
abc | Special Information
abc | Some data
def | Some data
def | Special Information
def | Some data
ghi | Some data
ghi | Some data
我想设计一个查询,为我提供没有特殊信息
的文档集的 UniqueID例如,对于上述数据集,结果将是 ghi
谢谢
有多个没有特殊信息的唯一 ID。从这里开始并根据需要进行调整:
设置
PUT special_info
{
"mappings": {
"properties": {
"unique_id": {
"type": "keyword"
},
"information": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
同步
POST _bulk
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"abc","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"abc","information":"Special Information"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"abc","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"def","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"def","information":"Special Information"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"def","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"ghi","information":"Some data"}
{"index":{"_index":"special_info","_type":"_doc"}}
{"unique_id":"ghi","information":"Some data"}
查询
GET special_info/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"information.keyword": {
"value": "Special Information"
}
}
}
]
}
},
"_source": "unique_id",
"aggs": {
"by_unique_ids": {
"terms": {
"field": "unique_id"
}
}
}
}
屈服
...
"aggregations" : {
"by_unique_ids" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "abc",
"doc_count" : 2
},
{
"key" : "def",
"doc_count" : 2
},
{
"key" : "ghi",
"doc_count" : 2
}
]
}
}
上面我是用聚合来解决的
我用过terms aggregation, filter aggregation and bucket selector aggregation
使用术语聚合创建 unique_id 的存储桶。获取具有特殊信息的术语下的文档数。如果 count==0 那么 return bucket.
查询:
{
"size": 0,
"aggs": {
"unique_id": {
"terms": {
"field": "unique_id",
"size": 10
},
"aggs": {
"filter_special_infor": {
"filter": {
"term": {
"information.keyword": "Special Information"
}
},
"aggs": {
"filtered_count": {
"value_count": {
"field": "unique_id"
}
}
}
},
"doc_with_no_special_infor": {
"bucket_selector": {
"buckets_path": {
"filteredCount": "filter_special_infor>filtered_count"
},
"script": "if(params.filteredCount==0){return true;}else{return false;}"
}
}
}
}
}
}
结果:
"unique_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "ghi",
"doc_count" : 2,
"filter_special_infor" : {
"doc_count" : 0,
"filtered_count" : {
"value" : 0
}
}
}
]
}