Pandas JSON 规范化 - 选择正确的记录路径

Question

我正在尝试弄清楚如何规范化下面采样的嵌套 JSON 响应。

现在，json_normalize(res,record_path=['data']) 提供了我需要的大部分数据，但我真正想要的是“session_pageviews”list/dict 中的详细信息以及包含数据 list/dic。

我试过 json_normalize(res,record_path=['data', ['session_pageviews']], meta = ['data']) 但我收到错误消息：ValueError: operands could not be broadcast together with shape (32400,) (180,)

我也试过 json_normalize(res,record_path=['data'], max_level = 1) 但那并没有解除嵌套 session_pageviews

如有任何帮助，我们将不胜感激！

Answer 1

您可以尝试将以下功能应用于您的json：

def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            #print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

通过这样做：

df1= flatten_nested_json_df(df)

哪里

df = pd.json_normalize(json)

这应该会为您提供 json 中包含的所有信息。

Pandas JSON 规范化 - 选择正确的记录路径

Pandas JSON Normalize - Choose Correct Record Path

python

pandas

json-normalize