如何匹配多个输入elasticsearch

how to match with multiple inputs elasticearch

我正在尝试使用以下查询在 3 个环境(开发、测试、生产)上查询所有可能的日志 terms:已尝试 mustshould

curl -vs -o -X POST http://localhost:9200/*/_search?pretty=true -d '
{
    "query": {
        "bool": {
           "minimum_should_match": 1,
           "should": {
                "terms": {
                    "can.deployment": ["can-prod", "can-test", "can-dev"]
                }
            "filter": [{
                    "range": {
                        "@timestamp": {
                            "gte": "2020-05-02T17:22:29.069Z",
                            "lt": "2020-05-23T17:23:29.069Z"
                        }
                    }
            }, {
                    "terms": {
                        "can.level": ["WARN", "ERROR"]
                    }
            }, {
                    "terms": {
                        "can.class": ["MTMessage", "ParserService", "JsonParser"]
                    }
            }]
        }
    }
}'

给出:

{
  "took" : 871,
  "timed_out" : false,
  "_shards" : {
    "total" : 391,
    "successful" : 389,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

但是,如果我将 terms 替换为 match,它会起作用。但无法查询其他输入,例如查询 WARN 消息、与 ParserService 相关的查询日志 class 等:

curl -vs -o -X POST http://localhost:9200/*/_search?pretty=true -d '
  {
    "query": {
        "bool": {
            "should": 
                [{"match": {"can.deployment": "can-prod"}}],
            "filter": [{
                    "range": {
                        "@timestamp": {
                            "gte": "2020-03-20T17:22:29.069Z",
                            "lt": "2020-05-01T17:23:29.069Z"
                        }
                    }
            },{
                    "match": {
                        "can.level": "ERROR"
                    }
            },{
                    "match": {
                        "can.class": "MTMessage"
                    }
            }
        ]
        }
    }
  }'

使用或不使用 terms/match 我如何完成此操作? 试过这个,没有运气。我得到 0 个搜索结果:

                    "match": {
                        "can.level": "ERROR"
                    }
            },{
                    "match": {
                        "can.level": "WARN"
                    }
            },{
                    "match": {
                        "can.class": "MTMessage"
                    }
            }

任何提示肯定会有帮助。 TIA!

[编辑] 添加映射 (/_mapping?pretty=true):

          "can" : {
            "properties" : {
              "class" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "deployment" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "level" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },

添加示例文档:

{
  "took" : 50,
  "timed_out" : false,
  "_shards" : {
    "total" : 391,
    "successful" : 387,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 5.44714,
    "hits" : [
      {
        "_index" : "filebeat-6.1.2-2020.05.21",
        "_type" : "doc",
        "_id" : "AXI9K_cggA4T9jvjZc03",
        "_score" : 5.44714,
        "_source" : {
          "@timestamp" : "2020-05-21T02:59:25.373Z",
          "offset" : 34395681,
          "beat" : {
            "hostname" : "4c80d1588455-661e-7054-a4e5-73c821d7",
            "name" : "4c80d1588455-661e-7054-a4e5-73c821d7",
            "version" : "6.1.2"
          },
          "prospector" : {
            "type" : "log"
          },
          "source" : "/var/logs/packages/gateway_mt/1a27957180c2b57a53e76dd686a06f4983bf233f/logs/gateway_mt.log",
          "message" : "[2020-05-21 02:59:25.373] ERROR can_gateway_mt [ActiveMT SNAP Worker 18253] --- ClientIdAuthenticationFilter: Cannot authorize publishing from client ThingPayload_4
325334a89c9 : not authorized",
          "fileset" : {
            "module" : "can",
            "name" : "services"
          },
          "fields" : { },
          "can" : {
            "component" : "can_gateway_mt",
            "instancename" : "canservices/0",
            "level" : "ERROR",
            "thread" : "ActiveMT SNAP Worker 18253",
            "message" : "Cannot authorize publishing from client ThingPayload_4325334a89c9 : not authorized",
            "class" : "ClientIdAuthenticationFilter",
            "timestamp" : "2020-05-21 02:59:25.373",
            "deployment" : "can-prod"
          }
        }
      }
    ]
  }
}

预期输出: 试图获取符合条件的整个文档的转储。类似于上面的示例文档。

"query": {
        "bool": {
           "minimum_should_match": 1,
           "should": {
                "terms": {
                    "can.deployment": ["can-prod", "can-test", "can-dev"]
                }
            "filter": [{
                    "range": {
                        "@timestamp": {
                            "gte": "2020-05-02T17:22:29.069Z",
                            "lt": "2020-05-23T17:23:29.069Z"
                        }
                    }
            }, {
                    "terms": {
                        "can.level": ["WARN", "ERROR"]
                    }
            }, {
                    "terms": {
                        "can.class": ["MTMessage", "ParserService", "JsonParser"]
                    }
            }]
        }
    }

我想,上述搜索查询不起作用,因为您的字段 can.deployementcan.levelcan.class 是一个 text 字段。如果这些是文本字段,Elasticsearch 会默认使用标准分析器来分析这些类型的字段,它会按停用词划分文本并将所有文本转换为小写。您可以从 here.

参考更多相关信息

对于您的情况,例如 can.deployement 字段值 can-prod 将被分析为

{
    "tokens": [
        {
            "token": "can",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "prod",
            "start_offset": 4,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

Terms 查询匹配精确的单词(区分大小写的搜索),但由于 elasticsearch 分析您的文本并划分并转换为小写,您无法找到精确的搜索文本。

为了解决这个问题,在为这 3 个字段(can.deployementcan.levelcan.class)创建索引映射时,您可以创建一个 keyword 字段类型,它基本上告诉 Elasticsearch 不要分析该字段并将其按原样存储。

您可以为这 3 个字段创建映射,例如:

映射:

 "mappings": {
            "properties": {
                "can.class": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword"
                        }
                    }
                },
                "can.deployment": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword"
                        }
                    }
                },
                "can.level": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword"
                        }
                    }
                }
            }
        }
    }

现在您可以使用这些关键字字段执行 terms 搜索:

搜索查询:

{ "query": {
        "bool": {
           "minimum_should_match": 1,
           "should": {
                "terms": {
                    "can.deployment.keyword": ["can-prod", "can-test", "can-dev"]
                }
           },
            "filter": [ {
                    "terms": {
                        "can.level.keyword": ["WARN", "ERROR"]
                    }
            }, {
                    "terms": {
                        "can.class.keyword": ["MTMessage", "ParserService", "JsonParser"]
                    }
            }]
        }
    }
}

这个词条查询只适用于区分大小写的搜索。您可以从 here 参考更多相关信息。

如果您想进行不区分大小写的搜索,您可以使用 match 查询来执行相同的操作:

搜索查询:

{
    "query": {
        "bool": {

            "must": [
                {
                    "match": {
                        "level": "warn error"
                    }
                },
                {
                    "match": {
                        "class": "MTMessage ParserService JsonParser"
                    }
                },
                {
                    "match": {
                        "deployment": "can-test can-prod can-dev"
                    }
                }

            ]
        }
    }
}

这是有效的,因为默认情况下 Elasticsearch 使用与索引分析器相同的分析器分析您的 match 查询文本。由于在您的情况下它是标准分析器,它会将此 match 查询文本转换为小写并删除停用词。您可以从 here 阅读更多相关信息。

例如,对于搜索值 MTMessage ParserService JsonParser,它将在内部被分析为:

{
    "tokens": [
        {
            "token": "mtmessage",
            "start_offset": 0,
            "end_offset": 9,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "parserservice",
            "start_offset": 10,
            "end_offset": 23,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "jsonparser",
            "start_offset": 24,
            "end_offset": 34,
            "type": "<ALPHANUM>",
            "position": 2
        }
    ]
}

并且由于您的文档中包含此字段的值也以这种方式进行了分析,因此它们将匹配。

这里是这个值 can-test can-prod can-dev 的一个问题,它将被分析为:

{
    "tokens": [
        {
            "token": "can",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "test",
            "start_offset": 4,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "can",
            "start_offset": 9,
            "end_offset": 12,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "prod",
            "start_offset": 13,
            "end_offset": 17,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "can",
            "start_offset": 18,
            "end_offset": 21,
            "type": "<ALPHANUM>",
            "position": 4
        },
        {
            "token": "dev",
            "start_offset": 22,
            "end_offset": 25,
            "type": "<ALPHANUM>",
            "position": 5
        }
    ]
}

现在,如果您的索引中存在此类文档:

{
    "can.deployment": "can",
    "can.level": "WARN",
    "can.class": "JsonParser"

}

那么该文档也会显示在您的搜索结果中。

因此,根据您要执行的搜索类型和您拥有的搜索数据类型,您可以决定是使用 terms 查询还是 match 查询。