如何在 elasticsearch 中获取嵌套字段的不同值?
How can I get distinct values of nested fields in elasticsearch?
我在 elasticsearch 中的文档结构如下:
root
|-- userid: string (nullable = true)
|-- name: string (nullable = true)
|-- applications: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- applicationid: string (nullable = true)
| | |-- createdat: string (nullable = true)
| | |-- source_name: string (nullable = true)
| | |-- accounts: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- applicationcreditreportaccountid: string
(nullable = true)
| | | | |-- account_type: integer (nullable = true)
| | | | |-- account_department: string (nullable = true)
下面是我的索引映射:
{
"bureau_data" : {
"mappings" : {
"dynamic_date_formats" : [
"yyyy-MM-dd"
],
"dynamic_templates" : [
{
"objects" : {
"match_mapping_type" : "object",
"mapping" : {
"type" : "nested"
}
}
}
],
"properties" : {
"raw_derived" : {
"type" : "nested",
"properties" : {
"applications" : {
"type" : "nested",
"properties" : {
"accounts" : {
"type" : "nested",
"properties" : {
"account_type_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"accounttypeid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"applicationcreditreportaccountid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"currentbalance" : {
"type" : "long"
},
"dayspastdue" : {
"type" : "long"
},
"institution_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"institutionid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"applicationcreditreportid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"applicationid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"createdat" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"creditbureautypeid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dateofbirth" : {
"type" : "date",
"format" : "yyyy-MM-dd"
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"score" : {
"type" : "long"
},
"source_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"updatedat" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"dob" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"middlename" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"mobilephone" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"userid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
}
}
}
}
}
我想要 account_type 字段的不同值,它是一个嵌套字段。我试过只给我不同计数的查询。
GET /my_index/_search?size=0
{
"aggs": {
"nested_path": {
"nested": {
"path": "raw_derived.applications.accounts"
},
"aggs": {
"distinct_values": {
"cardinality": {
"field": "raw_derived.applications.accounts.account_type.keyword"
}
}
}
}
}
}
我希望输出具有不同的 account_type 值,但输出仅为计数。下面是我的输出片段:
"hits" : {
"total" : {
"value" : 50,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"nested_path" : {
"doc_count" : 828,
"distinct_values" : {
"value" : 70
}
}
}
}
以下是我尝试过的查询及其工作方式:
GET /bureau_data/_search?size=0
{
"_source": "{aggregations}",
"aggs": {
"unique": {
"nested": {
"path": "raw_derived.applications"
},
"aggs": {
"score_unq": {
"terms": {
"field": "raw_derived.applications.source_name.keyword"
}
}
}
}
}
}
任何建议都会有帮助
来自官方文档——
基数聚合:-
计算不同值的近似计数的单值指标聚合。可以从文档中的特定字段中提取值,也可以通过脚本生成值。
而不是通过 "cardinality" 聚合,尝试如下的术语聚合:
{
"size":0,
"aggregations": {
"distinct_values": {
"terms": {
"field": "raw_derived.applications.accounts.account_type.keyword",
"size": 1000,
"min_doc_count": 1,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
我在 elasticsearch 中的文档结构如下:
root
|-- userid: string (nullable = true)
|-- name: string (nullable = true)
|-- applications: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- applicationid: string (nullable = true)
| | |-- createdat: string (nullable = true)
| | |-- source_name: string (nullable = true)
| | |-- accounts: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- applicationcreditreportaccountid: string
(nullable = true)
| | | | |-- account_type: integer (nullable = true)
| | | | |-- account_department: string (nullable = true)
下面是我的索引映射:
{
"bureau_data" : {
"mappings" : {
"dynamic_date_formats" : [
"yyyy-MM-dd"
],
"dynamic_templates" : [
{
"objects" : {
"match_mapping_type" : "object",
"mapping" : {
"type" : "nested"
}
}
}
],
"properties" : {
"raw_derived" : {
"type" : "nested",
"properties" : {
"applications" : {
"type" : "nested",
"properties" : {
"accounts" : {
"type" : "nested",
"properties" : {
"account_type_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"accounttypeid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"applicationcreditreportaccountid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"currentbalance" : {
"type" : "long"
},
"dayspastdue" : {
"type" : "long"
},
"institution_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"institutionid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"applicationcreditreportid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"applicationid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"createdat" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"creditbureautypeid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dateofbirth" : {
"type" : "date",
"format" : "yyyy-MM-dd"
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"score" : {
"type" : "long"
},
"source_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"updatedat" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"dob" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"middlename" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"mobilephone" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"userid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
}
}
}
}
}
我想要 account_type 字段的不同值,它是一个嵌套字段。我试过只给我不同计数的查询。
GET /my_index/_search?size=0
{
"aggs": {
"nested_path": {
"nested": {
"path": "raw_derived.applications.accounts"
},
"aggs": {
"distinct_values": {
"cardinality": {
"field": "raw_derived.applications.accounts.account_type.keyword"
}
}
}
}
}
}
我希望输出具有不同的 account_type 值,但输出仅为计数。下面是我的输出片段:
"hits" : {
"total" : {
"value" : 50,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"nested_path" : {
"doc_count" : 828,
"distinct_values" : {
"value" : 70
}
}
}
}
以下是我尝试过的查询及其工作方式:
GET /bureau_data/_search?size=0
{
"_source": "{aggregations}",
"aggs": {
"unique": {
"nested": {
"path": "raw_derived.applications"
},
"aggs": {
"score_unq": {
"terms": {
"field": "raw_derived.applications.source_name.keyword"
}
}
}
}
}
}
任何建议都会有帮助
来自官方文档—— 基数聚合:- 计算不同值的近似计数的单值指标聚合。可以从文档中的特定字段中提取值,也可以通过脚本生成值。
而不是通过 "cardinality" 聚合,尝试如下的术语聚合:
{
"size":0,
"aggregations": {
"distinct_values": {
"terms": {
"field": "raw_derived.applications.accounts.account_type.keyword",
"size": 1000,
"min_doc_count": 1,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}