嵌套 JSON:pandas.json_normalize 和错误不可散列类型:'dict'
Nested JSON: pandas.json_normalize and error unhashable type: 'dict'
在这里,我从请求响应中得到了一个嵌套的 JSON,例如:
{
'code': 0,
'daily_stats': [{'consume_data': {'fans_go_detail_count': 0,
'fans_impression_count': 215,
'fans_play_count': 7,
'go_detail_count': 0,
'impression_count': 226,
'play_count': 8},
'date': '2020-06-22'}],
'jump_rate': [],
'message': 'success',
'total_stat': {'consume_data': {'fans_go_detail_count': 0,
'fans_impression_count': 215,
'fans_play_count': 7,
'go_detail_count': 0,
'impression_count': 226,
'play_count': 8},
'consume_detail': {'click_rate': 0.035398230088495575,
'read_complete_rate': 0,
'read_duration': 111},
'fans_change_count': 0,
'fans_data': {},
'interaction_data': {},
'ranking_data': {}}}
我想要一个像这样的扁平化 df:
日期,daily_stats.consume_data.fans_go_detail_count,consume_detail.click_rate等
将它输入 pandas.json_normalize 我得到:
df = pd.json_normalize(r.json())
list(df)
['code',
'daily_stats',
'jump_rate',
'message',
'total_stat.consume_data.fans_go_detail_count',
'total_stat.consume_data.fans_impression_count',
'total_stat.consume_data.fans_play_count',
'total_stat.consume_data.go_detail_count',
'total_stat.consume_data.impression_count',
'total_stat.consume_data.play_count',
'total_stat.consume_detail.click_rate',
'total_stat.consume_detail.read_complete_rate',
'total_stat.consume_detail.read_duration',
'total_stat.fans_change_count']
问题:
- 'daily_stats' 和 'jump_rate' 仍然打包在列表中,如:
df['daily_stats']
0 [{'consume_data': {'fans_go_detail_count': 0, ...
Name: daily_stats, dtype: object
- 'fans_data':{}、'interaction_data':{}、'ranking_data':{} 等空字段缺失。
我试图添加 record_path=r.json['daily_stats']
然后我得到:
unhashable type: 'dict'
当然可以手动将每个循环解包到 dfs 并加入并转换为平面的,但我觉得有一种方法可以更轻松地做到这一点。
- 给定
r
作为 dict
。
# load r into a dataframe
df = pd.json_normalize(r)
# explode the columns with lists
df = df.apply(lambda x: x.explode()).reset_index(drop=True)
# expand the dicts in daily_stats and join them to df
df = df.join(pd.json_normalize(df.daily_stats)).drop(columns=['daily_stats'])
# display(df)
code jump_rate message total_stat.consume_data.fans_go_detail_count total_stat.consume_data.fans_impression_count total_stat.consume_data.fans_play_count total_stat.consume_data.go_detail_count total_stat.consume_data.impression_count total_stat.consume_data.play_count total_stat.consume_detail.click_rate total_stat.consume_detail.read_complete_rate total_stat.consume_detail.read_duration total_stat.fans_change_count date consume_data.fans_go_detail_count consume_data.fans_impression_count consume_data.fans_play_count consume_data.go_detail_count consume_data.impression_count consume_data.play_count
0 0 NaN success 0 215 7 0 226 8 0.035398 0 111 0 2020-06-22 0 215 7 0 226 8
其他资源
- Splitting dictionary/list inside a Pandas Column into Separate Columns
在这里,我从请求响应中得到了一个嵌套的 JSON,例如:
{
'code': 0,
'daily_stats': [{'consume_data': {'fans_go_detail_count': 0,
'fans_impression_count': 215,
'fans_play_count': 7,
'go_detail_count': 0,
'impression_count': 226,
'play_count': 8},
'date': '2020-06-22'}],
'jump_rate': [],
'message': 'success',
'total_stat': {'consume_data': {'fans_go_detail_count': 0,
'fans_impression_count': 215,
'fans_play_count': 7,
'go_detail_count': 0,
'impression_count': 226,
'play_count': 8},
'consume_detail': {'click_rate': 0.035398230088495575,
'read_complete_rate': 0,
'read_duration': 111},
'fans_change_count': 0,
'fans_data': {},
'interaction_data': {},
'ranking_data': {}}}
我想要一个像这样的扁平化 df:
日期,daily_stats.consume_data.fans_go_detail_count,consume_detail.click_rate等
将它输入 pandas.json_normalize 我得到:
df = pd.json_normalize(r.json())
list(df)
['code',
'daily_stats',
'jump_rate',
'message',
'total_stat.consume_data.fans_go_detail_count',
'total_stat.consume_data.fans_impression_count',
'total_stat.consume_data.fans_play_count',
'total_stat.consume_data.go_detail_count',
'total_stat.consume_data.impression_count',
'total_stat.consume_data.play_count',
'total_stat.consume_detail.click_rate',
'total_stat.consume_detail.read_complete_rate',
'total_stat.consume_detail.read_duration',
'total_stat.fans_change_count']
问题:
- 'daily_stats' 和 'jump_rate' 仍然打包在列表中,如:
df['daily_stats']
0 [{'consume_data': {'fans_go_detail_count': 0, ...
Name: daily_stats, dtype: object
- 'fans_data':{}、'interaction_data':{}、'ranking_data':{} 等空字段缺失。
我试图添加 record_path=r.json['daily_stats'] 然后我得到:
unhashable type: 'dict'
当然可以手动将每个循环解包到 dfs 并加入并转换为平面的,但我觉得有一种方法可以更轻松地做到这一点。
- 给定
r
作为dict
。
# load r into a dataframe
df = pd.json_normalize(r)
# explode the columns with lists
df = df.apply(lambda x: x.explode()).reset_index(drop=True)
# expand the dicts in daily_stats and join them to df
df = df.join(pd.json_normalize(df.daily_stats)).drop(columns=['daily_stats'])
# display(df)
code jump_rate message total_stat.consume_data.fans_go_detail_count total_stat.consume_data.fans_impression_count total_stat.consume_data.fans_play_count total_stat.consume_data.go_detail_count total_stat.consume_data.impression_count total_stat.consume_data.play_count total_stat.consume_detail.click_rate total_stat.consume_detail.read_complete_rate total_stat.consume_detail.read_duration total_stat.fans_change_count date consume_data.fans_go_detail_count consume_data.fans_impression_count consume_data.fans_play_count consume_data.go_detail_count consume_data.impression_count consume_data.play_count
0 0 NaN success 0 215 7 0 226 8 0.035398 0 111 0 2020-06-22 0 215 7 0 226 8
其他资源
- Splitting dictionary/list inside a Pandas Column into Separate Columns