使用 Pandas/Python 规范化嵌套 JSON 数据
Normalize nested JSON data with Pandas/Python
我正在尝试规范化类似的样本数据
{
"2018-04-26 10:09:33": [
{
"user_id": "M8BE957ZA",
"ts": "2018-04-26 10:06:33",
"message": "Hello"
}
],
"2018-04-27 19:10:55": [
{
"user_id": "M5320QS1X",
"ts": "2018-04-27 19:10:55",
"message": "Thank you"
}
],
我知道我可以使用 json_normalize(data,'2018-04-26 10:09:33',record_prefix= '')
在 pandas 中创建一个 table,但是 date/time 一直在变化。我怎样才能规范化它,所以我有如下?任何建议
user_id. ts message
2018-04-26 10:09:33 M8BE957ZA. 2018-04-26 10:06:33. Hello
2018-04-26 10:09:33 M5320QS1X 2018-04-27 19:10:55. Thank you
test = {
"2018-04-26 10:09:33": [
{
"user_id": "M8BE957ZA",
"ts": "2018-04-26 10:06:33",
"message": "Hello"
}
],
"2018-04-27 19:10:55": [
{
"user_id": "M5320QS1X",
"ts": "2018-04-27 19:10:55",
"message": "Thank you"
}
]}
df = pd.DataFrame(test).melt()
variable value
0 2018-04-26 10:09:33 {'user_id': 'M8BE957ZA', 'ts': '2018-04-26 10:...
1 2018-04-27 19:10:55 {'user_id': 'M5320QS1X', 'ts': '2018-04-27 19:...
读入你的数据框作为你的字典,然后融化它得到上面的结构。接下来,您可以在值列上使用 json_normalize
,然后像这样将其重新连接到变量列:
df.join(json_normalize(df['value'])).drop(columns = 'value').rename(columns = {'variable':'date'})
date user_id ts message
0 2018-04-26 10:09:33 M8BE957ZA 2018-04-26 10:06:33 Hello
1 2018-04-27 19:10:55 M5320QS1X 2018-04-27 19:10:55 Thank you
我正在尝试规范化类似的样本数据
{
"2018-04-26 10:09:33": [
{
"user_id": "M8BE957ZA",
"ts": "2018-04-26 10:06:33",
"message": "Hello"
}
],
"2018-04-27 19:10:55": [
{
"user_id": "M5320QS1X",
"ts": "2018-04-27 19:10:55",
"message": "Thank you"
}
],
我知道我可以使用 json_normalize(data,'2018-04-26 10:09:33',record_prefix= '')
在 pandas 中创建一个 table,但是 date/time 一直在变化。我怎样才能规范化它,所以我有如下?任何建议
user_id. ts message
2018-04-26 10:09:33 M8BE957ZA. 2018-04-26 10:06:33. Hello
2018-04-26 10:09:33 M5320QS1X 2018-04-27 19:10:55. Thank you
test = {
"2018-04-26 10:09:33": [
{
"user_id": "M8BE957ZA",
"ts": "2018-04-26 10:06:33",
"message": "Hello"
}
],
"2018-04-27 19:10:55": [
{
"user_id": "M5320QS1X",
"ts": "2018-04-27 19:10:55",
"message": "Thank you"
}
]}
df = pd.DataFrame(test).melt()
variable value
0 2018-04-26 10:09:33 {'user_id': 'M8BE957ZA', 'ts': '2018-04-26 10:...
1 2018-04-27 19:10:55 {'user_id': 'M5320QS1X', 'ts': '2018-04-27 19:...
读入你的数据框作为你的字典,然后融化它得到上面的结构。接下来,您可以在值列上使用 json_normalize
,然后像这样将其重新连接到变量列:
df.join(json_normalize(df['value'])).drop(columns = 'value').rename(columns = {'variable':'date'})
date user_id ts message
0 2018-04-26 10:09:33 M8BE957ZA 2018-04-26 10:06:33 Hello
1 2018-04-27 19:10:55 M5320QS1X 2018-04-27 19:10:55 Thank you