如何匹配多个输入elasticsearch

Question

我正在尝试使用以下查询在 3 个环境（开发、测试、生产）上查询所有可能的日志 terms：已尝试 must 和 should。

curl -vs -o -X POST http://localhost:9200/*/_search?pretty=true -d '
{
    "query": {
        "bool": {
           "minimum_should_match": 1,
           "should": {
                "terms": {
                    "can.deployment": ["can-prod", "can-test", "can-dev"]
                }
            "filter": [{
                    "range": {
                        "@timestamp": {
                            "gte": "2020-05-02T17:22:29.069Z",
                            "lt": "2020-05-23T17:23:29.069Z"
                        }
                    }
            }, {
                    "terms": {
                        "can.level": ["WARN", "ERROR"]
                    }
            }, {
                    "terms": {
                        "can.class": ["MTMessage", "ParserService", "JsonParser"]
                    }
            }]
        }
    }
}'

给出：

{
  "took" : 871,
  "timed_out" : false,
  "_shards" : {
    "total" : 391,
    "successful" : 389,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

但是，如果我将 terms 替换为 match，它会起作用。但无法查询其他输入，例如查询 WARN 消息、与 ParserService 相关的查询日志 class 等：

curl -vs -o -X POST http://localhost:9200/*/_search?pretty=true -d '
  {
    "query": {
        "bool": {
            "should": 
                [{"match": {"can.deployment": "can-prod"}}],
            "filter": [{
                    "range": {
                        "@timestamp": {
                            "gte": "2020-03-20T17:22:29.069Z",
                            "lt": "2020-05-01T17:23:29.069Z"
                        }
                    }
            },{
                    "match": {
                        "can.level": "ERROR"
                    }
            },{
                    "match": {
                        "can.class": "MTMessage"
                    }
            }
        ]
        }
    }
  }'

使用或不使用 terms/match 我如何完成此操作？试过这个，没有运气。我得到 0 个搜索结果：

                    "match": {
                        "can.level": "ERROR"
                    }
            },{
                    "match": {
                        "can.level": "WARN"
                    }
            },{
                    "match": {
                        "can.class": "MTMessage"
                    }
            }

任何提示肯定会有帮助。 TIA！

[编辑] 添加映射 (/_mapping?pretty=true):

          "can" : {
            "properties" : {
              "class" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "deployment" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "level" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },

添加示例文档：

{
  "took" : 50,
  "timed_out" : false,
  "_shards" : {
    "total" : 391,
    "successful" : 387,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 5.44714,
    "hits" : [
      {
        "_index" : "filebeat-6.1.2-2020.05.21",
        "_type" : "doc",
        "_id" : "AXI9K_cggA4T9jvjZc03",
        "_score" : 5.44714,
        "_source" : {
          "@timestamp" : "2020-05-21T02:59:25.373Z",
          "offset" : 34395681,
          "beat" : {
            "hostname" : "4c80d1588455-661e-7054-a4e5-73c821d7",
            "name" : "4c80d1588455-661e-7054-a4e5-73c821d7",
            "version" : "6.1.2"
          },
          "prospector" : {
            "type" : "log"
          },
          "source" : "/var/logs/packages/gateway_mt/1a27957180c2b57a53e76dd686a06f4983bf233f/logs/gateway_mt.log",
          "message" : "[2020-05-21 02:59:25.373] ERROR can_gateway_mt [ActiveMT SNAP Worker 18253] --- ClientIdAuthenticationFilter: Cannot authorize publishing from client ThingPayload_4
325334a89c9 : not authorized",
          "fileset" : {
            "module" : "can",
            "name" : "services"
          },
          "fields" : { },
          "can" : {
            "component" : "can_gateway_mt",
            "instancename" : "canservices/0",
            "level" : "ERROR",
            "thread" : "ActiveMT SNAP Worker 18253",
            "message" : "Cannot authorize publishing from client ThingPayload_4325334a89c9 : not authorized",
            "class" : "ClientIdAuthenticationFilter",
            "timestamp" : "2020-05-21 02:59:25.373",
            "deployment" : "can-prod"
          }
        }
      }
    ]
  }
}

预期输出：试图获取符合条件的整个文档的转储。类似于上面的示例文档。

Answer 1

"query": {
        "bool": {
           "minimum_should_match": 1,
           "should": {
                "terms": {
                    "can.deployment": ["can-prod", "can-test", "can-dev"]
                }
            "filter": [{
                    "range": {
                        "@timestamp": {
                            "gte": "2020-05-02T17:22:29.069Z",
                            "lt": "2020-05-23T17:23:29.069Z"
                        }
                    }
            }, {
                    "terms": {
                        "can.level": ["WARN", "ERROR"]
                    }
            }, {
                    "terms": {
                        "can.class": ["MTMessage", "ParserService", "JsonParser"]
                    }
            }]
        }
    }

我想，上述搜索查询不起作用，因为您的字段 can.deployement、can.level 和 can.class 是一个 text 字段。如果这些是文本字段，Elasticsearch 会默认使用标准分析器来分析这些类型的字段，它会按停用词划分文本并将所有文本转换为小写。您可以从 here.

参考更多相关信息

对于您的情况，例如 can.deployement 字段值 can-prod 将被分析为

{
    "tokens": [
        {
            "token": "can",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "prod",
            "start_offset": 4,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

Terms 查询匹配精确的单词（区分大小写的搜索），但由于 elasticsearch 分析您的文本并划分并转换为小写，您无法找到精确的搜索文本。

为了解决这个问题，在为这 3 个字段（can.deployement、can.level 和 can.class）创建索引映射时，您可以创建一个 keyword 字段类型，它基本上告诉 Elasticsearch 不要分析该字段并将其按原样存储。

您可以为这 3 个字段创建映射，例如：

映射：

 "mappings": {
            "properties": {
                "can.class": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword"
                        }
                    }
                },
                "can.deployment": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword"
                        }
                    }
                },
                "can.level": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword"
                        }
                    }
                }
            }
        }
    }

现在您可以使用这些关键字字段执行 terms 搜索：

搜索查询：

{ "query": {
        "bool": {
           "minimum_should_match": 1,
           "should": {
                "terms": {
                    "can.deployment.keyword": ["can-prod", "can-test", "can-dev"]
                }
           },
            "filter": [ {
                    "terms": {
                        "can.level.keyword": ["WARN", "ERROR"]
                    }
            }, {
                    "terms": {
                        "can.class.keyword": ["MTMessage", "ParserService", "JsonParser"]
                    }
            }]
        }
    }
}

这个词条查询只适用于区分大小写的搜索。您可以从 here 参考更多相关信息。

Answer 2

如果您想进行不区分大小写的搜索，您可以使用 match 查询来执行相同的操作：

搜索查询：

{
    "query": {
        "bool": {

            "must": [
                {
                    "match": {
                        "level": "warn error"
                    }
                },
                {
                    "match": {
                        "class": "MTMessage ParserService JsonParser"
                    }
                },
                {
                    "match": {
                        "deployment": "can-test can-prod can-dev"
                    }
                }

            ]
        }
    }
}

这是有效的，因为默认情况下 Elasticsearch 使用与索引分析器相同的分析器分析您的 match 查询文本。由于在您的情况下它是标准分析器，它会将此 match 查询文本转换为小写并删除停用词。您可以从 here 阅读更多相关信息。

例如，对于搜索值 MTMessage ParserService JsonParser，它将在内部被分析为：

{
    "tokens": [
        {
            "token": "mtmessage",
            "start_offset": 0,
            "end_offset": 9,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "parserservice",
            "start_offset": 10,
            "end_offset": 23,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "jsonparser",
            "start_offset": 24,
            "end_offset": 34,
            "type": "<ALPHANUM>",
            "position": 2
        }
    ]
}

并且由于您的文档中包含此字段的值也以这种方式进行了分析，因此它们将匹配。

这里是这个值 can-test can-prod can-dev 的一个问题，它将被分析为：

{
    "tokens": [
        {
            "token": "can",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "test",
            "start_offset": 4,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "can",
            "start_offset": 9,
            "end_offset": 12,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "prod",
            "start_offset": 13,
            "end_offset": 17,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "can",
            "start_offset": 18,
            "end_offset": 21,
            "type": "<ALPHANUM>",
            "position": 4
        },
        {
            "token": "dev",
            "start_offset": 22,
            "end_offset": 25,
            "type": "<ALPHANUM>",
            "position": 5
        }
    ]
}

现在，如果您的索引中存在此类文档：

{
    "can.deployment": "can",
    "can.level": "WARN",
    "can.class": "JsonParser"

}

那么该文档也会显示在您的搜索结果中。

因此，根据您要执行的搜索类型和您拥有的搜索数据类型，您可以决定是使用 terms 查询还是 match 查询。

如何匹配多个输入elasticsearch

how to match with multiple inputs elasticearch

elasticsearch

elasticsearch-dsl

映射：

搜索查询：

搜索查询：