ElasticSearch 中数字字段的神秘错误值

Mysteriously wrong values of numerical fields in ElasticSearch

过去 2 天我一直在研究这个令人费解的问题 issue:I 有一个带有自定义映射的索引,我在该索引上执行了一些聚合。问题是,在数值字段的聚合结果中,它 returns 值没有出现在从中导入数据的数据库中,即使结果数量相同。

我发现了一个类似的问题 here,问题是索引中的字段映射不一致,但在我的例子中,它被映射为相同的类型。问题发生在以下字段:award.value.amountaward.value.x_amountEurtender.value.x_amountEur 据我所知 checked.This 是我当前的映射,如 curl -XGET 'http://localhost:9200/documents/_mappings?pretty&human' 所述(那部分包含目标字段):

     {
      "documents" : {
        "mappings" : {
          "document" : {
            "properties" : {
              "additionalIdentifiers" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "award" : {
                "type" : "nested",
                "properties" : {
                  "_id" : {
                    "properties" : {
                      "$oid" : {
                        "type" : "string"
                      }
                    }
                  },
                  "contract_number" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                  },
                  "date" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "x_day" : {
                        "type" : "integer"
                      },
                      "x_month" : {
                        "type" : "integer"
                      },
                      "x_year" : {
                        "type" : "integer"
                      }
                    }
                  },
                  "description" : {
                    "type" : "string"
                  },
                  "initialValue" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "amount" : {
                        "type" : "float"
                      },
                      "currency" : {
                        "type" : "string"
                      },
                      "x_vat" : {
                        "type" : "float"
                      }
                    }
                  },
                  "minValue" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "amount" : {
                        "type" : "float"
                      },
                      "x_amountEur" : {
                        "type" : "float"
                      }
                    }
                  },
                  "title" : {
                    "type" : "string"
                  },
                  "value" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "amount" : {
                        "type" : "float"
                      },
                      "currency" : {
                        "type" : "string"
                      },
                      "x_amountEur" : {
                        "type" : "float"
                      },
                      "x_vat" : {
                        "type" : "float"
                      },
                      "x_vatbool" : {
                        "type" : "boolean"
                      }
                    }
                  },
                  "x_initialValue" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "x_amountEur" : {
                        "type" : "float"
                      },
                      "x_vatbool" : {
                        "type" : "boolean"
                      }
                    }
                  }
                }
              },
              "awardCriteria" : {
                "type" : "string"
              },
              "contract_number" : {
                "type" : "string"
              },
              "document_id" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "numberOfTenderers" : {
                "type" : "string"
              },
              "procurementMethod" : {
                "type" : "string"
              },
              "procuring_entity" : {
                "type" : "nested",
                "properties" : {
                  "_id" : {
                    "properties" : {
                      "$oid" : {
                        "type" : "string"
                      }
                    }
                  },
                  "address" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "country" : {
                        "type" : "string"
                      },
                      "countryName" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                      },
                      "email" : {
                        "type" : "string"
                      },
                      "locality" : {
                        "type" : "string"
                      },
                      "postalCode" : {
                        "type" : "string"
                      },
                      "streetAddress" : {
                        "type" : "string"
                      },
                      "telephone" : {
                        "type" : "string"
                      },
                      "x_url" : {
                        "type" : "string"
                      }
                    }
                  },
                  "name" : {
                    "type" : "string"
                  },
                  "x_slug" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                  }
                }
              },
              "suppliers" : {
                "type" : "nested",
                "properties" : {
                  "_id" : {
                    "properties" : {
                      "$oid" : {
                        "type" : "string"
                      }
                    }
                  },
                  "address" : {
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "email" : {
                        "type" : "string"
                      },
                      "locality" : {
                        "type" : "string"
                      },
                      "postalCode" : {
                        "type" : "string"
                      },
                      "streetAddress" : {
                        "type" : "string"
                      },
                      "telephone" : {
                        "type" : "string"
                      },
                      "x_url" : {
                        "type" : "string"
                      }
                    }
                  },
                  "name" : {
                    "type" : "string"
                  },
                  "x_slug" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                  }
                }
              },
              "tender" : {
                "type" : "nested",
                "properties" : {
                  "_id" : {
                    "properties" : {
                      "$oid" : {
                        "type" : "string"
                      }
                    }
                  },
                  "value" : {
                    "type" : "nested",
                    "properties" : {
                      "_id" : {
                        "properties" : {
                          "$oid" : {
                            "type" : "string"
                          }
                        }
                      },
                      "amount" : {
                        "type" : "float"
                      },
                      "currency" : {
                        "type" : "string"
                      },
                      "x_amountEur" : {
                        "type" : "float"
                      },
                      "x_vat" : {
                        "type" : "float"
                      },
                      "x_vatbool" : {
                        "type" : "boolean"
                      }
                    }
                  }
                }
              }  

这是我为了获得每对供应商之间的合同价值而使用的聚合 - procuring_entity:

    Document.es.search({
      "search_type": "count" ,
      "body":{
    "aggregations": {
        "entities":{
          "nested": {
            "path": "procuring_entity"
          },
          "aggs": {
            "procuring_entity_names": {
              "terms": {
                "field": "procuring_entity.x_slug",
                "size": 0
              },
              "aggs": {
                "suppliers": {
                  "nested": {
                    "path": "suppliers"
                  },
                  "aggs": {
                    "suppliers_names": {
                      "terms":{
                        "field": "suppliers.x_slug",
                        "size": 0
                      },
                      "aggs": {
                        "awards": {
                          "nested": {
                            "path": "award.value"
                          },
                          "aggs": {
                            "award_amounts": {
                              "terms":{
                                "field": "award.value.x_amountEur",
                                "size": 0
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }})

float 类型的结果是:

    {"entities"=>
     {"doc_count"=>24300,
      "procuring_entity_names"=>
       {"doc_count_error_upper_bound"=>0,
        "sum_other_doc_count"=>0,
        "buckets"=>
         [{"key"=>"vsia-bernu-kliniska-universitates-slimnica",
           "doc_count"=>1360,
           "suppliers"=>
            {"doc_count"=>1360,
             "suppliers_names"=>
              {"doc_count_error_upper_bound"=>0,
               "sum_other_doc_count"=>0,
               "buckets"=>
                [{"key"=>"recipe-plus-as",
                  "doc_count"=>388,
                  "awards"=>
                   {"doc_count"=>388,
                    "awards"=>
                     {"doc_count_error_upper_bound"=>0,
                      "sum_other_doc_count"=>0,
                      "buckets"=>
                       [{"key"=>3679.086669921875, "doc_count"=>373},
                        {"key"=>0.0, "doc_count"=>12},
                        {"key"=>73610.3203125, "doc_count"=>1},
                        {"key"=>244000.0, "doc_count"=>1},
                        {"key"=>342348.9375, "doc_count"=>1}]}}}

问题是在 MongoDB 同一个查询中 returns 388 个文档都有 award.value.x_amountEur = 3679.08661250056 ,如 Mongoid 查询所示:

    Document.where(:"procuring_entity.x_slug" => "vsia-bernu-kliniska-universitates-slimnica")
            .keep_if{|doc| doc.suppliers.first.x_slug == "recipe-plus-as"}
            .map{|doc| doc.award.value.x_amountEur}.uniq 
    =>[3679.08661250056]

一个查询直接变成MongoDBreturns一样。 我还尝试将目标字段映射为 double,这给出了相同的结果,并且作为 long 返回了以下内容(甚至更不正确的结果):

   {"entities"=> 
     {"doc_count"=>24300, 
      "procuring_entity_names"=> 
       {"doc_count_error_upper_bound"=>0, 
        "sum_other_doc_count"=>0, 
        "buckets"=> 
         [{"key"=>"vsia-bernu-kliniska-universitates-slimnica", 
           "doc_count"=>1360, 
           "suppliers"=> 
            {"doc_count"=>1360, 
             "suppliers_names"=> 
              {"doc_count_error_upper_bound"=>0, 
               "sum_other_doc_count"=>0, 
               "buckets"=> 
                [{"key"=>"recipe-plus-as", 
                  "doc_count"=>388, 
                  "awards"=> 
                   {"doc_count"=>388, 
                    "awards"=> 
                     {"doc_count_error_upper_bound"=>0, 
                      "sum_other_doc_count"=>0, 
                      "buckets"=> 
                       [{"key"=>3679, "doc_count"=>371}, 
                        {"key"=>0, "doc_count"=>12}, 
                        {"key"=>44300, "doc_count"=>1}, 
                        {"key"=>80472, "doc_count"=>1}, 
                        {"key"=>331636, "doc_count"=>1}, 
                        {"key"=>342348, "doc_count"=>1}, 
                        {"key"=>1658805, "doc_count"=>1}]}}}

我正在使用 Elasticsearch 2.0、mongoid 5.0.1 和 mongoid-elasticsearch 进行索引。我想不出还有什么可以做的,所以欢迎和赞赏任何建议。

我试图用 ES 2.0 测试你的场景,但我遗漏了一些东西。我无法让它为 award.value.x_amountEur 创建存储桶,除非我使用 reverse_nested 聚合从一个嵌套路径到另一个嵌套路径 "get out"。

因此,我使用的是相同的聚合,而不是您拥有的 awards 聚合,但 "wrapped" 在 reverse_nested 聚合中:

  "aggs": {
    "getting_back": {
      "reverse_nested": {},
      "aggs": {
        "awards": {
          "nested": {
            "path": "award.value"
          },
          "aggs": {
            "award_amounts": {
              "terms": {
                "field": "award.value.x_amountEur"
              }
            }
          }
        }
      }
    }
  }

对于这个,我觉得还不错。

稍后编辑:按照我的和更一般的@Val 的建议,完整的解决方案是在 awards 和 [=17= 上使用 reverse_nested ]聚合。