如何在 Python 中规范化包含列表(应作为列表保存)的 json 文件 | Pandas?

How to normalize json file containing a list (that should be kept as a list) in Python | Pandas?

我正在尝试使用 json_normalize 函数将 json 文件转换为数据帧。

来源JSON

问题是未来的列之一包含一个列表(应该是这样),但是在 json_normalize 函数的元部分中包含此列会引发以下错误:

ValueError: operands could not be broadcast together with shape (22,) (11,)

当我尝试在以下代码中的列表中添加“teams”时出现错误:

pd.json_normalize(data, 'sites', ['sport_key', 'sport_nice', 'home_team', 'teams'])

假设 data 是一个字典列表,您仍然可以使用 json_normalize 但您必须为 data 中的每个相应字典分别分配 teams 列:

def normalize(d):
    return pd.json_normalize(d, 'sites', ['sport_key', 'sport_nice', 'home_team'])\
           .assign(teams=[d['teams']]*len(d['sites']))


df = pd.concat([normalize(d) for d in data], ignore_index=True)

或者您可以尝试:

data = [{**d, 'teams': ','.join(d['teams'])} for d in data]
df = pd.json_normalize(data, 'sites', ['sport_key', 'sport_nice', 'home_team', 'teams'])
df['teams'] = df['teams'].str.split(',')

结果:

      site_key     site_nice  last_update      odds.h2h         sport_key sport_nice        home_team                               teams
0  marathonbet  Marathon Bet   1608156452  [1.28, 3.54]  basketball_ncaab      NCAAB  Bryant Bulldogs  [Bryant Bulldogs, Wagner Seahawks]
1     sport888      888sport   1608156452   [1.13, 5.8]  basketball_ncaab      NCAAB  Bryant Bulldogs  [Bryant Bulldogs, Wagner Seahawks]
2       unibet        Unibet   1608156434   [1.13, 5.8]  basketball_ncaab      NCAAB  Bryant Bulldogs  [Bryant Bulldogs, Wagner Seahawks]