Elasticsearch returns 在 QueryBuilders 中使用 termQuery 时命中率为零

Elasticsearch returns zero hits when using termQuery in QueryBuilders

我正在构建一个 Java 应用程序来搜索来自 Elasticsearch 的数据(数据从 kafka 到 logstash,然后是 json 格式的 elasticsearch)。当我使用 QueryBuilders.queryStringQuery(reqId) 时,我可以毫无问题地返回所有结果,但是当我使用 QueryBuilders.termQuery("routingRequestID", reqId); 时,即使 ES 数据中存在 reqId,我也会获得 0 次点击。


    RestHighLevelClient client = new RestHighLevelClient(
            RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")));

    @GetMapping("/q/{reqId}")
    public String searchByReqId(@PathVariable("reqId") final String reqId) throws IOException {
        String[] indexes = {"devglan-log-test"};

        QueryBuilder queryBuilder = QueryBuilders.termQuery("routingRequestID", reqId);
        // QueryBuilder queryBuilder = QueryBuilders.queryStringQuery(reqId);

        SearchSourceBuilder searchSource = SearchSourceBuilder.searchSource().query(queryBuilder).from(0).size(1000);
        System.out.println(searchSource.query());

        SearchRequest searchRequest = new SearchRequest(indexes, searchSource);
        System.out.println(searchRequest.source().toString());

        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
        System.out.println(searchResponse.toString());
        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHits = hits.getHits();
        for (SearchHit hit : searchHits) {
            System.out.println(hit.toString());
        }

        return "success";
    }
{
   took: 633,
   timed_out: false,
   _shards: {
      total: 1,
      successful: 1,
      skipped: 0,
      failed: 0
   },
   hits: {
      total: {
         value: 1,
         relation: "eq"
      },
      max_score: 1.6739764,
      hits: [
      {
         _index: "devglan-log-test",
         _type: "_doc",
         _id: "k4qAPXEBCzyTR4XVXPb2",
         _score: 1.6739764,
         _source: {
            @version: "1",
            message: "
                      {"requestorRole":"role3", "requestorGivenName":"doe", "requestorSurName":"male", 
                       "requestorOrganizationName":"dob", "reqd":"address", 
                       "requestorC":"city", "routingRequestID":"7778787898778879"}",
            @timestamp: "2020-04-03T00:45:53.917Z"
        }
      }
    ]
  }
}

searchSource.query()生成的查询:

{
  "term" : {
    "routingRequestID" : {
      "value" : "2421",
      "boost" : 1.0
    }
  }
}

searchRequest.source().toString() 中生成的查询:

{"from":0,"size":1000,"query":{"term":{"routingRequestID":{"value":"2421","boost":1.0}}}}

结果:

{"took":0,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]}}

非常感谢所有帮助,如果您知道如何提供帮助,请不要跳过 post。 *击掌表情符号*

由于您还没有为您的搜索词提供索引、示例文档和预期文档的映射。我根据任何信息猜测, 是您的 routingRequestID 和您使用的查询类型的问题。

看起来 routingRequestID 被定义为 text,它默认使用 standard 分析器,当您使用 query string query 时,Elasticsearch 应用了相同的分析器使用索引时间,如下同link:

The query then analyzes each split text independently before returning matching documents.

但是当您按照 term query doc 中的说明使用 termQuery 时,它不会被分析并使用在查询中传递的相同文本:

Returns documents that contain an exact term in a provided field.

解决方案:

如果您希望两个查询的结果与其分析的查询相同,请尝试使用 match query

我认为您应该检查数据 routingRequestID = 2421 是否存在。

//This queryBuilders like SQL: select * from XXX where routingRequestID=2421 limit 0,1000
{"from":0,"size":1000,"query":{"term":{"routingRequestID":{"value":"2421","boost":1.0}}}}

您的文档没有字段 routingRequestId。它有一个字段 message,其中包含字段 routingRequestId

所以要构建的查询应该是:

{
  "query": {
    "match": {
      "message.routingRequestId": "2421"
    }
  }
}

所以问题是所有信息都在一个字段中。我通过更改 logstash 配置然后使用 matchQuery 解决了这个问题。 如果您使用的是 kafka 和 json 格式,则需要将以下内容添加到您的 logstash 配置文件中:

input {
   kafka {
      bootstrap_servers => "kafka ip"
      topics => ["your kafka topics"]
   }
}
filter {
      json {
        source => "message"
      }
      mutate {
         remove_field => ["message"]
      }
    }

顺便说一下,我使用的是 elasticsearch 7.4、最新的 logstash 和最新的 kafka v。 祝你好运,并感谢所有试图提供帮助的人!我很感激! 这是用于 elasticsearch logstash 插件的 link,它将指导您完成不同的选项: https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html