从 Elasticsearch 查询结果中创建平面数组

Make a flat array from Elasticsearch query results

我有一个包含以下文档(简体)的索引:

{
    "user" : "j.johnson",
    "certifications" : [{
            "certification_date" : "2013-02-09T00:00:00+03:00",
            "previous_level" : "No Level",
            "obtained_level" : "Junior"
        }, {
            "certification_date" : "2014-05-26T00:00:00+03:00",
            "previous_level" : "Junior",
            "obtained_level" : "Middle"
        }
    ]
}

我只想获得所有用户通过的所有认证的简单列表,其中 certification_date > 2014-01-01。它应该是一个相当大的数组,如下所示:

[{
        "certification_date" : "2014-09-08T00:00:00+03:00",
        "previous_level" : "No Level",
        "obtained_level" : "Junior"
    }, {
        "certification_date" : "2014-05-26T00:00:00+03:00",
        "previous_level" : "Junior",
        "obtained_level" : "Middle"
    }, {
        "certification_date" : "2015-01-26T00:00:00+03:00",
        "previous_level" : "Junior",
        "obtained_level" : "Middle"
    }
    ...
]

这似乎不是一项艰巨的任务,但我无法找到一种简单的方法来做到这一点。

我会用 parent/child 关系来做,尽管你必须重新组织你的数据。我不认为你可以用你当前的模式得到你想要的。

更具体地说,我建立了一个这样的索引,user 作为父项,certification 作为子项:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
   },
   "mappings": {
      "user": {
         "properties": {
            "user_name": { "type": "string" }
         }
      },
      "certification":{
          "_parent": { "type": "user" },
          "properties": {
              "certification_date": { "type": "date" },
              "previous_level": { "type": "string" },
              "obtained_level": { "type": "string" }
          }
      }
   }
}

添加了一些文档:

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"user","_id":1}}
{"user_name":"j.johnson"}
{"index":{"_index":"test_index","_type":"certification","_parent":1}}
{"certification_date" : "2013-02-09T00:00:00+03:00","previous_level" : "No Level","obtained_level" : "Junior"}
{"index":{"_index":"test_index","_type":"certification","_parent":1}}
{"certification_date" : "2014-05-26T00:00:00+03:00","previous_level" : "Junior","obtained_level" : "Middle"}
{"index":{"_index":"test_index","_type":"user","_id":2}}
{ "user_name":"b.bronson"}
{"index":{"_index":"test_index","_type":"certification","_parent":2}}
{"certification_date" : "2013-09-05T00:00:00+03:00","previous_level" : "No Level","obtained_level" : "Junior"}
{"index":{"_index":"test_index","_type":"certification","_parent":2}}
{"certification_date" : "2014-07-20T00:00:00+03:00","previous_level" : "Junior","obtained_level" : "Middle"}

现在我可以使用范围过滤器搜索 certifications

POST /test_index/certification/_search
{
   "query": {
      "constant_score": {
         "filter": {
            "range": {
               "certification_date": {
                  "gte": "2014-01-01"
               }
            }
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "certification",
            "_id": "QGXHp7JZTeafWYzb_1FZiA",
            "_score": 1,
            "_source": {
               "certification_date": "2014-05-26T00:00:00+03:00",
               "previous_level": "Junior",
               "obtained_level": "Middle"
            }
         },
         {
            "_index": "test_index",
            "_type": "certification",
            "_id": "yvO2A9JaTieI5VHVRikDfg",
            "_score": 1,
            "_source": {
               "certification_date": "2014-07-20T00:00:00+03:00",
               "previous_level": "Junior",
               "obtained_level": "Middle"
            }
         }
      ]
   }
}

这个结构仍然没有按照您要求的方式完全平坦,但我认为这与 ES 允许您得到的一样接近。

这是我使用的代码:

http://sense.qbox.io/gist/3c733ec75e6c0856fa2772cc8f67bd7c00aba637