Python 展平深层嵌套 JSON
Python Flatten Deep Nested JSON
我有以下 JSON 结构:
{
"comments_v2": [
{
"timestamp": 1196272984,
"data": [
{
"comment": {
"timestamp": 1196272984,
"comment": "OSI Beach Party Weekend, CA",
"author": "xxxx"
}
}
],
"title": "xxxx commented on his own photo."
},
{
"timestamp": 1232918783,
"data": [
{
"comment": {
"timestamp": 1232918783,
"comment": "We'll see about that.",
"author": "xxxx"
}
}
]
}
]
}
我正在尝试将这个 JSON 扁平化为 pandas 数据框,这是我的解决方案:
# Read file
df = pd.read_json(codecs.open(infile, "r", "utf-8-sig"))
# Normalize
df = pd.json_normalize(df["comments_v2"])
child_column = pd.json_normalize(df["data"])
child_column = pd.concat([child_column.drop([0], axis=1), child_column[0].apply(pd.Series)], axis=1)
df_merge = df.join(child_column)
df_merge.drop(["data"], axis=1, inplace=True)
得到的dataframe如下:
timestamp
title
comment.timestamp
comment.comment
comment.author
comment.group
1196272984
xxxx commented on his own photo
1196272984
OSI Beach Party Weekend, CA
XXXXX
NaN
是否有更简单的方法来平整 JSON 以获得如上所示的结果?
谢谢!
使用record_path='data'
作为pd.json_normalize
的参数:
import json
import codecs
with codecs.open(infile, 'r', 'utf-8-sig') as jsonfile:
data = json.load(jsonfile)
df = pd.json_normalize(data['comments_v2'], 'data')
输出:
>>> df
comment.timestamp comment.comment comment.author
0 1196272984 OSI Beach Party Weekend, CA xxxx
1 1232918783 We'll see about that. xxxx
尝试flatten_json(本例中将json设置为js)
from flatten_json import flatten^M
dic_flattened = (flatten(d, '.') for d in list(js['comments_v2']))^M
df = pd.DataFrame(dic_flattened)^M
df
timestamp data.0.comment.timestamp data.0.comment.comment data.0.comment.author title
0 1196272984 1196272984 OSI Beach Party Weekend, CA xxxx xxxx commented on his own photo.
1 1232918783 1232918783 We'll see about that. xxxx NaN
我有以下 JSON 结构:
{
"comments_v2": [
{
"timestamp": 1196272984,
"data": [
{
"comment": {
"timestamp": 1196272984,
"comment": "OSI Beach Party Weekend, CA",
"author": "xxxx"
}
}
],
"title": "xxxx commented on his own photo."
},
{
"timestamp": 1232918783,
"data": [
{
"comment": {
"timestamp": 1232918783,
"comment": "We'll see about that.",
"author": "xxxx"
}
}
]
}
]
}
我正在尝试将这个 JSON 扁平化为 pandas 数据框,这是我的解决方案:
# Read file
df = pd.read_json(codecs.open(infile, "r", "utf-8-sig"))
# Normalize
df = pd.json_normalize(df["comments_v2"])
child_column = pd.json_normalize(df["data"])
child_column = pd.concat([child_column.drop([0], axis=1), child_column[0].apply(pd.Series)], axis=1)
df_merge = df.join(child_column)
df_merge.drop(["data"], axis=1, inplace=True)
得到的dataframe如下:
timestamp | title | comment.timestamp | comment.comment | comment.author | comment.group |
---|---|---|---|---|---|
1196272984 | xxxx commented on his own photo | 1196272984 | OSI Beach Party Weekend, CA | XXXXX | NaN |
是否有更简单的方法来平整 JSON 以获得如上所示的结果?
谢谢!
使用record_path='data'
作为pd.json_normalize
的参数:
import json
import codecs
with codecs.open(infile, 'r', 'utf-8-sig') as jsonfile:
data = json.load(jsonfile)
df = pd.json_normalize(data['comments_v2'], 'data')
输出:
>>> df
comment.timestamp comment.comment comment.author
0 1196272984 OSI Beach Party Weekend, CA xxxx
1 1232918783 We'll see about that. xxxx
尝试flatten_json(本例中将json设置为js)
from flatten_json import flatten^M
dic_flattened = (flatten(d, '.') for d in list(js['comments_v2']))^M
df = pd.DataFrame(dic_flattened)^M
df
timestamp data.0.comment.timestamp data.0.comment.comment data.0.comment.author title
0 1196272984 1196272984 OSI Beach Party Weekend, CA xxxx xxxx commented on his own photo.
1 1232918783 1232918783 We'll see about that. xxxx NaN