如何在 Elasticsearch 中获取重复的字段值及其计数
How to get duplicate field values and their count in Elasticsearch
我有一个使用 ELK 堆栈的学校项目。
我有很多数据,我想知道哪些日志行是重复的,以及根据日志级别、服务器和时间范围,该特定日志行有多少重复项。
我试过这个查询,成功地提取了重复的数字:
GET /_all/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"beat.hostname": "server-x"
}
},
{
"match": {
"log_level": "WARNING"
}
},{
"range": {
"@timestamp" : {
"gte" : "now-48h",
"lte" : "now"
}
}
}
]
}
},
"aggs": {
"duplicateNames": {
"terms": {
"field": "message_description.keyword",
"min_doc_count": 2,
"size": 10000
}
}
}
}
它成功地给我输出:
"aggregations" : {
"duplicateNames" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "AuthToken not found [ ]",
"doc_count" : 657
}
]
}
当我尝试完全相同的查询并且仅将 log_level
从 WARNING
更改为 CRITICAL
时,它给了我 0 个存储桶。这很奇怪,因为我可以在 Kibana 中看到有重复的 message_description
字段值。这与 .keyword
或 message_description
的长度有关吗?
我希望有人能帮我解决这个奇怪的问题。
编辑:
这两份文件完全一样message_description
,为什么我查不到结果?
{
"_index" : "filebeat-2019.09.17",
"_type" : "_doc",
"_id" : "yYzDP20BiDGBoVteKHjZ",
"_score" : 10.144365,
"_source" : {
"beat" : {
"name" : "graylog",
"hostname" : "server-x",
"version" : "6.8.2"
},
"message" : """[2019-09-17 17:06:57] request.CRITICAL: Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444 {"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
"@version" : "1",
"source" : "/data/httpd/xxx/xxx/var/log/dev.log",
"tags" : [
"beats_input_codec_plain_applied",
"_grokparsefailure",
"_dateparsefailure"
],
"timestamp" : "2019-09-17 17:06:57",
"input" : {
"type" : "log"
},
"offset" : 54819,
"prospector" : {
"type" : "log"
},
"application" : "request",
"log_level" : "CRITICAL",
"stack_trace" : """{"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
"message_description" : """Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444""",
"@timestamp" : "2019-09-17T15:06:57.436Z",
"host" : {
"name" : "graylog"
},
"log" : {
"file" : {
"path" : "/data/httpd/xxx/xxx/var/log/dev.log"
}
}
}
},
{
"_index" : "filebeat-2019.09.17",
"_type" : "_doc",
"_id" : "CYzDP20BiDGBoVteKHna",
"_score" : 10.144365,
"_source" : {
"beat" : {
"name" : "graylog",
"hostname" : "server-x",
"version" : "6.8.2"
},
"message" : """[2019-09-17 17:06:56] request.CRITICAL: Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444 {"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
"@version" : "1",
"source" : "/data/httpd/xxx/xxx/var/log/dev.log",
"tags" : [
"beats_input_codec_plain_applied",
"_grokparsefailure",
"_dateparsefailure"
],
"timestamp" : "2019-09-17 17:06:56",
"input" : {
"type" : "log"
},
"offset" : 45716,
"prospector" : {
"type" : "log"
},
"application" : "request",
"log_level" : "CRITICAL",
"stack_trace" : """{"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
"message_description" : """Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444""",
"@timestamp" : "2019-09-17T15:06:57.426Z",
"host" : {
"name" : "graylog"
},
"log" : {
"file" : {
"path" : "/data/httpd/xxx/xxx/var/log/dev.log"
}
}
}
}
发生的情况是 message_description
字段超过 256 个字符,因此 gets ignored。 运行 GET filebeat-2019.09.17
确认这一点。
您可以像这样修改字段的映射来增加该限制:
PUT filebeat-*/_doc/_mapping
{
"properties": {
"message_description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 500
}
}
}
}
}
然后更新这些索引中的所有数据:
POST filebeat-*/_update_by_query
完成后,您的查询将再次神奇地工作;-)
我有一个使用 ELK 堆栈的学校项目。
我有很多数据,我想知道哪些日志行是重复的,以及根据日志级别、服务器和时间范围,该特定日志行有多少重复项。
我试过这个查询,成功地提取了重复的数字:
GET /_all/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"beat.hostname": "server-x"
}
},
{
"match": {
"log_level": "WARNING"
}
},{
"range": {
"@timestamp" : {
"gte" : "now-48h",
"lte" : "now"
}
}
}
]
}
},
"aggs": {
"duplicateNames": {
"terms": {
"field": "message_description.keyword",
"min_doc_count": 2,
"size": 10000
}
}
}
}
它成功地给我输出:
"aggregations" : {
"duplicateNames" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "AuthToken not found [ ]",
"doc_count" : 657
}
]
}
当我尝试完全相同的查询并且仅将 log_level
从 WARNING
更改为 CRITICAL
时,它给了我 0 个存储桶。这很奇怪,因为我可以在 Kibana 中看到有重复的 message_description
字段值。这与 .keyword
或 message_description
的长度有关吗?
我希望有人能帮我解决这个奇怪的问题。
编辑:
这两份文件完全一样message_description
,为什么我查不到结果?
{
"_index" : "filebeat-2019.09.17",
"_type" : "_doc",
"_id" : "yYzDP20BiDGBoVteKHjZ",
"_score" : 10.144365,
"_source" : {
"beat" : {
"name" : "graylog",
"hostname" : "server-x",
"version" : "6.8.2"
},
"message" : """[2019-09-17 17:06:57] request.CRITICAL: Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444 {"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
"@version" : "1",
"source" : "/data/httpd/xxx/xxx/var/log/dev.log",
"tags" : [
"beats_input_codec_plain_applied",
"_grokparsefailure",
"_dateparsefailure"
],
"timestamp" : "2019-09-17 17:06:57",
"input" : {
"type" : "log"
},
"offset" : 54819,
"prospector" : {
"type" : "log"
},
"application" : "request",
"log_level" : "CRITICAL",
"stack_trace" : """{"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
"message_description" : """Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444""",
"@timestamp" : "2019-09-17T15:06:57.436Z",
"host" : {
"name" : "graylog"
},
"log" : {
"file" : {
"path" : "/data/httpd/xxx/xxx/var/log/dev.log"
}
}
}
},
{
"_index" : "filebeat-2019.09.17",
"_type" : "_doc",
"_id" : "CYzDP20BiDGBoVteKHna",
"_score" : 10.144365,
"_source" : {
"beat" : {
"name" : "graylog",
"hostname" : "server-x",
"version" : "6.8.2"
},
"message" : """[2019-09-17 17:06:56] request.CRITICAL: Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444 {"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
"@version" : "1",
"source" : "/data/httpd/xxx/xxx/var/log/dev.log",
"tags" : [
"beats_input_codec_plain_applied",
"_grokparsefailure",
"_dateparsefailure"
],
"timestamp" : "2019-09-17 17:06:56",
"input" : {
"type" : "log"
},
"offset" : 45716,
"prospector" : {
"type" : "log"
},
"application" : "request",
"log_level" : "CRITICAL",
"stack_trace" : """{"exception":"[object] (ErrorException(code: 0): Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php:444)"} []""",
"message_description" : """Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/xxx/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/xxx/xxx/vendor/composer/ClassLoader.php line 444""",
"@timestamp" : "2019-09-17T15:06:57.426Z",
"host" : {
"name" : "graylog"
},
"log" : {
"file" : {
"path" : "/data/httpd/xxx/xxx/var/log/dev.log"
}
}
}
}
发生的情况是 message_description
字段超过 256 个字符,因此 gets ignored。 运行 GET filebeat-2019.09.17
确认这一点。
您可以像这样修改字段的映射来增加该限制:
PUT filebeat-*/_doc/_mapping
{
"properties": {
"message_description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 500
}
}
}
}
}
然后更新这些索引中的所有数据:
POST filebeat-*/_update_by_query
完成后,您的查询将再次神奇地工作;-)