如何使用 DSL 查询在 elasticsearch 中匹配精确的文档数据？

Question

我的分词器

 "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }

我正在尝试基于此字段搜索值，但这里的问题是无论何时，我想基于令牌进行搜索，假设如果我使用 s 令牌进行搜索，那么我应该得到匹配或开始于 s 的项目，现在如果我用 sp 搜索我想从 sp 开始获取项目丢弃其他东西，我只想获取以 sp 开头的值并丢弃所有，我没有得到我的查询错误或过滤器我用错了有人可以请帮我解决这个问题

 {
     "query": {
      "bool": {
       "must": [
        {
         "multi_match": {
          "query": "PRODUCT",
          "fields": [
           "item",
           "data1"
          ]
         }
        },
        {
         "multi_match": {
          "query": "SUB_FAMILY",
          "fields": [
           "item",
           "data1"
          ]
         }
        },
        {
         "match": {
          "values": "SP"
         }
        }
       ]
      }
     }
    }

这个查询的输出是

 "hits": [
                {
                    "_index": "logs_datas",
                    "_type": "_doc",
                    "_id": "H1PfEnkBQXpKNrJSp8bV",
                    "_score": 9.418445,
                    "_source": {
                        "message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
                        "path": "/home/elasticsearchDatas.csv",
                        "hierarchy_name": "PRODUCT",
                        "@version": "1",
                        "@timestamp": "2021-04-27T10:28:37.578Z",
                        "host": "ewiglp71",
                        "item_pk": "SPRINHO2H",
                        "attribute_name": "SUB_FAMILY"
                    }
                },
                {
                    "_index": "logs_datas",
                    "_type": "_doc",
                    "_id": "y1PfEnkBQXpKNrJSp8XQ",
                    "_score": 5.3059187,
                    "_source": {
                        "message": "PRODUCT,SUB_FAMILY,SCMLPLWVI",
                        "path": "/home/niteshb/elasticsearchDatas.csv",
                        "hierarchy_name": "PRODUCT",
                        "@version": "1",
                        "@timestamp": "2021-04-27T10:28:37.577Z",
                        "host": "ewiglp71",
                        "item_pk": "SCMLPLWVI",
                        "attribute_name": "SUB_FAMILY"
                    }
                },
                {
                    "_index": "logs_datas",
                    "_type": "_doc",
                    "_id": "zFPfEnkBQXpKNrJSp8XQ",
                    "_score": 5.3059187,
                    "_source": {
                        "message": "PRODUCT,SUB_FAMILY,SSVRKEN2Z",
                        "path": "/home/elasticsearchDatas.csv",
                        "hierarchy_name": "PRODUCT",
                        "@version": "1",
                        "@timestamp": "2021-04-27T10:28:37.579Z",
                        "host": "ewiglp71",
                        "item_pk": "SSVRKEN2Z",
                        "attribute_name": "SUB_FAMILY"
                    }
                }
                }
            ]
        }
    }

Answer 1

由于 min_gram 为 1，因此为 SCMLPLWVI 生成的令牌将为

{
  "tokens": [
    {
      "token": "S",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "SC",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "SCM",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "SCML",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 3
    },
    {
      "token": "SCMLP",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 4
    },
    {
      "token": "SCMLPL",
      "start_offset": 0,
      "end_offset": 6,
      "type": "word",
      "position": 5
    },
    {
      "token": "SCMLPLW",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 6
    },
    {
      "token": "SCMLPLWV",
      "start_offset": 0,
      "end_offset": 8,
      "type": "word",
      "position": 7
    },
    {
      "token": "SCMLPLWVI",
      "start_offset": 0,
      "end_offset": 9,
      "type": "word",
      "position": 8
    }
  ]
}

如果您想获取以 sp 开头的值，那么您需要将分词器修改为

 "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,          // note this
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }

更新 1：

您可以使用 match_bool_prefix 搜索以 s 或 sp

开头的单词

添加一个工作示例

索引映射：

{
  "mappings": {
    "properties": {
      "item_pk": {
        "type": "text"
      }
    }
  }
}

搜索查询 1：

{
  "query": {
    "match_bool_prefix" : {
      "item_pk" : "s"
    }
  }
}

搜索结果将是

"hits": [
      {
        "_index": "67281810",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
          "path": "/home/niteshb/elasticsearchDatas.csv",
          "hierarchy_name": "PRODUCT",
          "@version": "1",
          "@timestamp": "2021-04-27T10:28:37.578Z",
          "host": "ewiglp71",
          "item_pk": "SPRINHO2H",
          "attribute_name": "SUB_FAMILY"
        }
      },
      {
        "_index": "67281810",
        "_type": "_doc",
        "_id": "i7quE3kB6jKCA-nFYii6",
        "_score": 1.0,
        "_source": {
          "message": "PRODUCT,SUB_FAMILY,SCMLPLWVI",
          "path": "/home/niteshb/elasticsearchDatas.csv",
          "hierarchy_name": "PRODUCT",
          "@version": "1",
          "@timestamp": "2021-04-27T10:28:37.577Z",
          "host": "ewiglp71",
          "item_pk": "SCMLPLWVI",
          "attribute_name": "SUB_FAMILY"
        }
      },
      {
        "_index": "67281810",
        "_type": "_doc",
        "_id": "jLquE3kB6jKCA-nFgiju",
        "_score": 1.0,
        "_source": {
          "message": "PRODUCT,SUB_FAMILY,SSVRKEN2Z",
          "path": "/home/niteshb/elasticsearchDatas.csv",
          "hierarchy_name": "PRODUCT",
          "@version": "1",
          "@timestamp": "2021-04-27T10:28:37.579Z",
          "host": "ewiglp71",
          "item_pk": "SSVRKEN2Z",
          "attribute_name": "SUB_FAMILY"
        }
      }
    ]

搜索查询 2：

{
  "query": {
    "match_bool_prefix" : {
      "item_pk" : "sp"
    }
  }
}

搜索结果：

"hits": [
      {
        "_index": "67281810",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
          "path": "/home/niteshb/elasticsearchDatas.csv",
          "hierarchy_name": "PRODUCT",
          "@version": "1",
          "@timestamp": "2021-04-27T10:28:37.578Z",
          "host": "ewiglp71",
          "item_pk": "SPRINHO2H",
          "attribute_name": "SUB_FAMILY"
        }
      }
    ]

更新二：

试试这个查询

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "hierarchy_name": "PRODUCT"
          }
        },
        {
          "match": {
            "attribute_name": "SUB_FAMILY"
          }
        },
        {
          "match_bool_prefix": {
            "item_pk": "sp"
          }
        }
      ]
    }
  }
}

如何使用 DSL 查询在 elasticsearch 中匹配精确的文档数据？

How to match exact document data in elasticsearch using DSL query?

elasticsearch

logstash

kibana

elasticsearch-dsl

elasticsearch-5