嵌套 JSON:pandas.json_normalize 和错误不可散列类型:'dict'

Nested JSON: pandas.json_normalize and error unhashable type: 'dict'

在这里,我从请求响应中得到了一个嵌套的 JSON,例如:

{
 'code': 0,
 'daily_stats': [{'consume_data': {'fans_go_detail_count': 0,
                                   'fans_impression_count': 215,
                                   'fans_play_count': 7,
                                   'go_detail_count': 0,
                                   'impression_count': 226,
                                   'play_count': 8},
                                   'date': '2020-06-22'}],
 'jump_rate': [],
 'message': 'success',
 'total_stat': {'consume_data': {'fans_go_detail_count': 0,
                                 'fans_impression_count': 215,
                                 'fans_play_count': 7,
                                 'go_detail_count': 0,
                                 'impression_count': 226,
                                 'play_count': 8},
  'consume_detail': {'click_rate': 0.035398230088495575,
                     'read_complete_rate': 0,
                     'read_duration': 111},
                     'fans_change_count': 0,
                     'fans_data': {},
                     'interaction_data': {},
                     'ranking_data': {}}}

我想要一个像这样的扁平化 df:

日期,daily_stats.consume_data.fans_go_detail_count,consume_detail.click_rate等

将它输入 pandas.json_normalize 我得到:


df = pd.json_normalize(r.json())
list(df)

['code',
 'daily_stats',
 'jump_rate',
 'message',
 'total_stat.consume_data.fans_go_detail_count',
 'total_stat.consume_data.fans_impression_count',
 'total_stat.consume_data.fans_play_count',
 'total_stat.consume_data.go_detail_count',
 'total_stat.consume_data.impression_count',
 'total_stat.consume_data.play_count',
 'total_stat.consume_detail.click_rate',
 'total_stat.consume_detail.read_complete_rate',
 'total_stat.consume_detail.read_duration',
 'total_stat.fans_change_count']

问题:

  1. 'daily_stats' 和 'jump_rate' 仍然打包在列表中,如:
df['daily_stats']

0    [{'consume_data': {'fans_go_detail_count': 0, ...
Name: daily_stats, dtype: object
  1. 'fans_data':{}、'interaction_data':{}、'ranking_data':{} 等空字段缺失。

我试图添加 record_path=r.json['daily_stats'] 然后我得到:

unhashable type: 'dict'

当然可以手动将每个循环解包到 dfs 并加入并转换为平面的,但我觉得有一种方法可以更轻松地做到这一点。

  • 给定 r 作为 dict
# load r into a dataframe
df = pd.json_normalize(r)

# explode the columns with lists
df = df.apply(lambda x: x.explode()).reset_index(drop=True)

# expand the dicts in daily_stats and join them to df
df = df.join(pd.json_normalize(df.daily_stats)).drop(columns=['daily_stats'])

# display(df)
   code jump_rate  message  total_stat.consume_data.fans_go_detail_count  total_stat.consume_data.fans_impression_count  total_stat.consume_data.fans_play_count  total_stat.consume_data.go_detail_count  total_stat.consume_data.impression_count  total_stat.consume_data.play_count  total_stat.consume_detail.click_rate  total_stat.consume_detail.read_complete_rate  total_stat.consume_detail.read_duration  total_stat.fans_change_count        date  consume_data.fans_go_detail_count  consume_data.fans_impression_count  consume_data.fans_play_count  consume_data.go_detail_count  consume_data.impression_count  consume_data.play_count
0     0       NaN  success                                             0                                            215                                        7                                        0                                       226                                   8                              0.035398                                             0                                      111                             0  2020-06-22                                  0                                 215                             7                             0                            226                        8

其他资源

  • Splitting dictionary/list inside a Pandas Column into Separate Columns