将嵌套的 JSON 对象转换为 pandas 数据框
Converting nested JSON object to pandas Dataframe
我正在处理一个具有嵌套字段(数组)的 JSON 文件。我正在尝试将其转换为 Pandas 数据框。
{
"_id": "2026",
"dataDate": 1537920000000,
"dataYear": 2018,
"groupId": "1378",
"HourConsumed": 19781.4,
"HourGenerated": 0,
"max": 4658.400000000001,
"maxGen": 0,
"maxTime": 1538001000000,
"avg": -206.05625,
"max": 0,
"maxGen": 0,
"maxTime": null,
"avgTemp": 0,
"me_Id": "2004506_3166155129",
"interval": 15,
"intervalMetaData": [
"whC",
"whG",
"max",
"maxGen",
"hC",
"hG",
"maxVar",
"maxGen",
"avgTemp",
"eventTime"
],
"intervalData": [
[
175.2,
0,
700.8,
0,
0,
0,
0,
0,
0,
1537920900000
],
[
192,
0,
768,
0,
0,
0,
0,
0,
0,
1537921800000
],
[
191.39999999999998,
0,
765.5999999999999,
0,
0,
0,
0,
0,
0,
1537922700000
]
]
}
我需要为 intervalMetaData
中的内容创建单独的列,然后用 intervalData
中的值填充这些列。可能吗?
你敢打赌这是可能的!就这么简单:
df = pd.DataFrame(j['intervalData'], columns=j['intervalMetaData'])
如果我没理解错的话,您只需通过使用 pandas:
导入您的列表列表来正确设置您的列
import pandas as pd
data = {
"_id": "2026",
"dataDate": 1537920000000,
"dataYear": 2018,
"groupId": "1378",
"HourConsumed": 19781.4,
"HourGenerated": 0,
"max": 4658.400000000001,
"maxGen": 0,
"maxTime": 1538001000000,
"avg": -206.05625,
"max": 0,
"maxGen": 0,
"maxTime": None,
"avgTemp": 0,
"me_Id": "2004506_3166155129",
"interval": 15,
"intervalMetaData": [
"whC",
"whG",
"max",
"maxGen",
"hC",
"hG",
"maxVar",
"maxGen",
"avgTemp",
"eventTime"
],
"intervalData": [
[
175.2,
0,
700.8,
0,
0,
0,
0,
0,
0,
1537920900000
],
[
192,
0,
768,
0,
0,
0,
0,
0,
0,
1537921800000
],
[
191.39999999999998,
0,
765.5999999999999,
0,
0,
0,
0,
0,
0,
1537922700000
]
]
}
df = pd.DataFrame(data["intervalData"], columns=data["intervalMetaData"])
print(df)
输出:
whC whG max maxGen hC hG maxVar maxGen avgTemp eventTime
0 175.2 0 700.8 0 0 0 0 0 0 1537920900000
1 192.0 0 768.0 0 0 0 0 0 0 1537921800000
2 191.4 0 765.6 0 0 0 0 0 0 1537922700000
编辑:您可以将其他键添加为带循环的列:
for k,v in data.items():
if k not in ["intervalData", "intervalMetaData"]:
df[k] = v
我正在处理一个具有嵌套字段(数组)的 JSON 文件。我正在尝试将其转换为 Pandas 数据框。
{
"_id": "2026",
"dataDate": 1537920000000,
"dataYear": 2018,
"groupId": "1378",
"HourConsumed": 19781.4,
"HourGenerated": 0,
"max": 4658.400000000001,
"maxGen": 0,
"maxTime": 1538001000000,
"avg": -206.05625,
"max": 0,
"maxGen": 0,
"maxTime": null,
"avgTemp": 0,
"me_Id": "2004506_3166155129",
"interval": 15,
"intervalMetaData": [
"whC",
"whG",
"max",
"maxGen",
"hC",
"hG",
"maxVar",
"maxGen",
"avgTemp",
"eventTime"
],
"intervalData": [
[
175.2,
0,
700.8,
0,
0,
0,
0,
0,
0,
1537920900000
],
[
192,
0,
768,
0,
0,
0,
0,
0,
0,
1537921800000
],
[
191.39999999999998,
0,
765.5999999999999,
0,
0,
0,
0,
0,
0,
1537922700000
]
]
}
我需要为 intervalMetaData
中的内容创建单独的列,然后用 intervalData
中的值填充这些列。可能吗?
你敢打赌这是可能的!就这么简单:
df = pd.DataFrame(j['intervalData'], columns=j['intervalMetaData'])
如果我没理解错的话,您只需通过使用 pandas:
导入您的列表列表来正确设置您的列import pandas as pd
data = {
"_id": "2026",
"dataDate": 1537920000000,
"dataYear": 2018,
"groupId": "1378",
"HourConsumed": 19781.4,
"HourGenerated": 0,
"max": 4658.400000000001,
"maxGen": 0,
"maxTime": 1538001000000,
"avg": -206.05625,
"max": 0,
"maxGen": 0,
"maxTime": None,
"avgTemp": 0,
"me_Id": "2004506_3166155129",
"interval": 15,
"intervalMetaData": [
"whC",
"whG",
"max",
"maxGen",
"hC",
"hG",
"maxVar",
"maxGen",
"avgTemp",
"eventTime"
],
"intervalData": [
[
175.2,
0,
700.8,
0,
0,
0,
0,
0,
0,
1537920900000
],
[
192,
0,
768,
0,
0,
0,
0,
0,
0,
1537921800000
],
[
191.39999999999998,
0,
765.5999999999999,
0,
0,
0,
0,
0,
0,
1537922700000
]
]
}
df = pd.DataFrame(data["intervalData"], columns=data["intervalMetaData"])
print(df)
输出:
whC whG max maxGen hC hG maxVar maxGen avgTemp eventTime
0 175.2 0 700.8 0 0 0 0 0 0 1537920900000
1 192.0 0 768.0 0 0 0 0 0 0 1537921800000
2 191.4 0 765.6 0 0 0 0 0 0 1537922700000
编辑:您可以将其他键添加为带循环的列:
for k,v in data.items():
if k not in ["intervalData", "intervalMetaData"]:
df[k] = v