解析 AWS ATHENA 输出

Question

这里 Python 相对较新，来自 node.js 背景，在解析我得到的输出时遇到很多问题 from get_query_results()

我已经做了几个小时了，我尝试遍历 ['ResultSetMetadata']['ColumnInfo'] 来获取列名，但我不知道如何将 ['ResultSet']['Data'] 与这些项目联系起来，所以代码知道将哪个名称应用于每个 dataValue.

我知道我需要 select 行 headers 然后将关联的 objects 添加到这些行，但是在 [=38= 中如何做这样的事情的逻辑] 逃不过我。

我可以看到第一列名称始终与第一个 ['Data']['VarCharValue'] 对齐，因此我可以按顺序获取所有值，但是如果我遍历 ['ResultSet']['Rows'] 如何隔离第一个迭代作为列名然后填充彼此的行？

或者有更好的方法吗？

这是我的json.dumps（雅典娜输出）

{
  "ResultSet": {
    "Rows": [{
      "Data": [{
        "VarCharValue": "postcode"
      }, {
        "VarCharValue": "CountOf"
      }]
    }, {
      "Data": [{
        "VarCharValue": "1231"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "1166"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "3651"
      }, {
        "VarCharValue": "3"
      }]
    }, {
      "Data": [{
        "VarCharValue": "2171"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4697"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4450"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4469"
      }, {
        "VarCharValue": "1"
      }]
    }],
      "ResultSetMetadata": {
        "ColumnInfo": [{
          "Scale": 0,
          "Name": "postcode",
          "Nullable": "UNKNOWN",
          "TableName": "",
          "Precision": 2147483647,
          "Label": "postcode",
          "CaseSensitive": true,
          "SchemaName": "",
          "Type": "varchar",
          "CatalogName": "hive"
        }, {
          "Scale": 0,
          "Name": "CountOf",
          "Nullable": "UNKNOWN",
          "TableName": "",
          "Precision": 19,
          "Label": "CountOf",
          "CaseSensitive": false,
          "SchemaName": "",
          "Type": "bigint",
          "CatalogName": "hive"
        }]
      }
  },
    "ResponseMetadata": {
      "RetryAttempts": 0,
        "HTTPStatusCode": 200,
          "RequestId": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
            "HTTPHeaders": {
              "date": "Mon, 01 Oct 2018 04:51:14 GMT",
                "x-amzn-requestid": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
                  "content-length": "1464",
                    "content-type": "application/x-amz-json-1.1",
                      "connection": "keep-alive"
            }
    }
}

我想要的结果是一个 JSON 数组，如下所示：

[{
  "postcode": "2171",
  "CountOf": "2"
}, {
  "postcode": "4697",
  "CountOf": "2"
}, {
  "postcode": "1166",
  "CountOf": "2"
},
 ...
]

Answer 1

>>> def get_var_char_values(d):
...     return [obj['VarCharValue'] for obj in d['Data']]
... 
... 
... header, *rows = input_data['ResultSet']['Rows']
... header = get_var_char_values(header)
... result = [dict(zip(header, get_var_char_values(row))) for row in rows]
>>> import json; print(json.dumps(result, indent=2))
[
  {
    "postcode": "4450",
    "CountOf": "2"
  },
  {
    "postcode": "1231",
    "CountOf": "2"
  },
  {
    "postcode": "4469",
    "CountOf": "1"
  },
  {
    "postcode": "3651",
    "CountOf": "3"
  },
  {
    "postcode": "1166",
    "CountOf": "2"
  },
  {
    "postcode": "4697",
    "CountOf": "2"
  },
  {
    "postcode": "2171",
    "CountOf": "2"
  }
]

解析 AWS ATHENA 输出

Parsing AWS ATHENA outputs

python

json

python-3.x

amazon-athena