如何使用 json_normalize 将 json 转换为数据帧?
how to convert json to a dataframe using json_normalize?
我正在尝试将来自 json 的 api 响应隐藏到 pandas 中的数据帧。我遇到的问题是 de 数据嵌套在 json 格式中,我没有在我的数据框中获得正确的列。
数据是从 api 收集的,格式如下:
{
"data": [
{
"timestamp": "2019-04-10T11:40:13.437Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.010000228881836
},
{
"comp": "humid",
"value": 34.4900016784668
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 166
},
{
"comp": "pm25",
"value": 4
},
{
"comp": "lux",
"value": 961.4000244140625
},
{
"comp": "spl_a",
"value": 45.70000076293945
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -2
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
{
"timestamp": "2019-04-10T11:40:03.413Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.040000915527344
},
{
"comp": "humid",
"value": 34.630001068115234
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 169
},
{
"comp": "pm25",
"value": 5
},
{
"comp": "lux",
"value": 960.2000122070312
},
{
"comp": "spl_a",
"value": 46
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -1
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
编辑,你可以看到更多的数据集
我已经尝试过的是:我已将 JSON 格式转换为字典,然后使用规范化函数对其进行规范化。代码如下:
data = r.json()
works_data = json_normalize(data=data['data'], record_path=['sensors'],meta=['timestamp'])
df = pd.DataFrame.from_dict(works_data)
print(df)
我得到的结果是:
comp value timestamp
0 temp 21.059999 2019-04-10T12:39:05.062Z
1 humid 31.250000 2019-04-10T12:39:05.062Z
2 co2 407.000000 2019-04-10T12:39:05.062Z
3 voc 136.000000 2019-04-10T12:39:05.062Z
4 pm25 3.000000 2019-04-10T12:39:05.062Z
5 lux 1302.099976 2019-04-10T12:39:05.062Z
6 spl_a 46.299999 2019-04-10T12:39:05.062Z
我需要的结果如下:
the result
有人可以帮助我吗?
你可以重塑你的 works_data:
data = {
"data": [
{
"timestamp": "2019-04-10T11:40:13.437Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.010000228881836
},
{
"comp": "humid",
"value": 34.4900016784668
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 166
},
{
"comp": "pm25",
"value": 4
},
{
"comp": "lux",
"value": 961.4000244140625
},
{
"comp": "spl_a",
"value": 45.70000076293945
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -2
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
{
"timestamp": "2019-04-10T11:40:03.413Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.040000915527344
},
{
"comp": "humid",
"value": 34.630001068115234
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 169
},
{
"comp": "pm25",
"value": 5
},
{
"comp": "lux",
"value": 960.2000122070312
},
{
"comp": "spl_a",
"value": 46
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -1
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
}]}
from pandas.io.json import json_normalize
import pandas as pd
df = pd.DataFrame()
for each in data['data']:
timestamp = each['timestamp']
temp_df = json_normalize(data=each, record_path=['sensors']).T
columns = list(temp_df.iloc[0])
data_values = list(temp_df.iloc[1,:])
temp_df = pd.DataFrame([data_values + [timestamp]], columns=columns + ['timestamp'])
df = df.append(temp_df).reset_index(drop=True)
print(df)
输出:
print(df)
temp humid co2 ... lux spl_a timestamp
0 20.010000 34.490002 418.0 ... 961.400024 45.700001 2019-04-10T11:40:13.437Z
1 20.040001 34.630001 418.0 ... 960.200012 46.000000 2019-04-10T11:40:03.413Z
[2 rows x 8 columns]
我正在尝试将来自 json 的 api 响应隐藏到 pandas 中的数据帧。我遇到的问题是 de 数据嵌套在 json 格式中,我没有在我的数据框中获得正确的列。
数据是从 api 收集的,格式如下:
{
"data": [
{
"timestamp": "2019-04-10T11:40:13.437Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.010000228881836
},
{
"comp": "humid",
"value": 34.4900016784668
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 166
},
{
"comp": "pm25",
"value": 4
},
{
"comp": "lux",
"value": 961.4000244140625
},
{
"comp": "spl_a",
"value": 45.70000076293945
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -2
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
{
"timestamp": "2019-04-10T11:40:03.413Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.040000915527344
},
{
"comp": "humid",
"value": 34.630001068115234
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 169
},
{
"comp": "pm25",
"value": 5
},
{
"comp": "lux",
"value": 960.2000122070312
},
{
"comp": "spl_a",
"value": 46
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -1
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
编辑,你可以看到更多的数据集
我已经尝试过的是:我已将 JSON 格式转换为字典,然后使用规范化函数对其进行规范化。代码如下:
data = r.json()
works_data = json_normalize(data=data['data'], record_path=['sensors'],meta=['timestamp'])
df = pd.DataFrame.from_dict(works_data)
print(df)
我得到的结果是:
comp value timestamp
0 temp 21.059999 2019-04-10T12:39:05.062Z
1 humid 31.250000 2019-04-10T12:39:05.062Z
2 co2 407.000000 2019-04-10T12:39:05.062Z
3 voc 136.000000 2019-04-10T12:39:05.062Z
4 pm25 3.000000 2019-04-10T12:39:05.062Z
5 lux 1302.099976 2019-04-10T12:39:05.062Z
6 spl_a 46.299999 2019-04-10T12:39:05.062Z
我需要的结果如下: the result
有人可以帮助我吗?
你可以重塑你的 works_data:
data = {
"data": [
{
"timestamp": "2019-04-10T11:40:13.437Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.010000228881836
},
{
"comp": "humid",
"value": 34.4900016784668
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 166
},
{
"comp": "pm25",
"value": 4
},
{
"comp": "lux",
"value": 961.4000244140625
},
{
"comp": "spl_a",
"value": 45.70000076293945
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -2
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
{
"timestamp": "2019-04-10T11:40:03.413Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.040000915527344
},
{
"comp": "humid",
"value": 34.630001068115234
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 169
},
{
"comp": "pm25",
"value": 5
},
{
"comp": "lux",
"value": 960.2000122070312
},
{
"comp": "spl_a",
"value": 46
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -1
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
}]}
from pandas.io.json import json_normalize
import pandas as pd
df = pd.DataFrame()
for each in data['data']:
timestamp = each['timestamp']
temp_df = json_normalize(data=each, record_path=['sensors']).T
columns = list(temp_df.iloc[0])
data_values = list(temp_df.iloc[1,:])
temp_df = pd.DataFrame([data_values + [timestamp]], columns=columns + ['timestamp'])
df = df.append(temp_df).reset_index(drop=True)
print(df)
输出:
print(df)
temp humid co2 ... lux spl_a timestamp
0 20.010000 34.490002 418.0 ... 961.400024 45.700001 2019-04-10T11:40:13.437Z
1 20.040001 34.630001 418.0 ... 960.200012 46.000000 2019-04-10T11:40:03.413Z
[2 rows x 8 columns]