有没有办法在 python3 jupyter notebook 中取消嵌套 pandas 数据框？

Question

我正在将 json 文件导入 python3 jupyter notebook。 json 文件的格式为

对象
- 房间[26个元素]
  - 0
    - 转
      - 来自浴室
      - 来自停车场
    - 距离
      - dfromBathroom
      - dfromParking
    - 深度
    - 面积
  - 1
    - .....等等
- 名字

我正在以这种方式导入 json 文件：

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize

with open("rooms.json") as file:
  data = json.load(file)
df = json_normalize(data['rooms'])

我现在正在尝试以类似矩阵的格式绘制 6 个维度中的每一个，总共有 36 个图表。

我正在尝试以下方式：

col_features = ['fromBathroom', 'fromParking', 'dfromBathroom', 'dfromParking', 'depth', 'area']
pd.plotting.scatter_matrix(df[col_features], alpha = .2, figsize = (14,8))

这不起作用，因为我收到一条错误消息： KeyError：“['fromBathroom' 'fromParking' 'dfromBathroom' 'dfromParking'] 不在索引中”

这是因为这些功能嵌套在 json 文件的 'turns' 和 'distances' 中。有没有一种方法可以取消嵌套这些功能，以便我可以像获取深度和面积一样索引到数据帧以获取值？

感谢您的任何见解。

Answer 1

也许您可以提取 df1 = df['turns']、df2 = df['distances'] 和 df3 = df['areas', 'depth]，然后执行 df4 = pd.concat([df1, df2, df3], join='inner', axis=1) see pandas doc

或直接：pd.concat([df['turns'], df['distances'], df['areas', 'depth]], join='inner', axis=1)

编辑：

我尝试了一些东西，希望它就是您要找的东西：

link to the image with the code and the results I get with Jupyter

df1 = df['turns']
df2 = df['distances']
df3 = pd.DataFrame(df['depth'])
df4 = pd.DataFrame(df['area'])
df_recomposed = pd.concat([df1, df2, df3, df4], join='inner', axis=1)

或Pandas - How to flatten a hierarchical index in columns

其中 df.columns = [' '.join(col).strip() for col in df.columns.values] 应该是您要查找的内容

有没有办法在 python3 jupyter notebook 中取消嵌套 pandas 数据框？

Is there a way to un-nesting a pandas dataframe in a python3 jupyter notebook?

python

dataframe

dimensionality-reduction

pandas

jupyter-notebook