复杂的字典到多列数据框

Complex dict to multicolumn dataframe

我有一个复杂的字典,其中存储了各种“深度”值。结构如下所示:

{
   "key1":"value1",
   "key2":[
      {
         "key2.1a":"value2.1a",
         "key2.2a":"value2.2a",
         "key2.3a":{
            "keya2.3.1a":"value2.3.1a"
         },     
         "key2.4a":"value2.4a",
         "key2.5a":"value2.5a",
         "key2.6a":"value2.6a",
         "key2.7a":"value2.7a",
         "key2.8a":"value2.8a",
         "key2.9a":"value2.9a",
         "key2.10a":{
            "key2.10.1a":"value2.10.1a",
            "key2.10.2a":"value2.10.2a",
            "key2.10.3a":"value2.10.3a",
            "key2.10.4a":{
               "key2.10.4.1a":"value2.10.4.1a"
            }
         },
         "key2.11a":{
            "key2.11.1a":"value2.11.1a",
            "key2.11.2a":"value2.11.2a"
         },
         "key2.12a":"value2.12a",
         "key2.13a":"value2.13a"
      },
      {
         "key2.1b":"value2.1b",
         "key2.2b":"value2.2b",
         "key2.3b":{
            "keya2.3.1b":"value2.3.1b"
         },     
         "key2.4b":"value2.4b",
         "key2.5b":"value2.5b",
         "key2.6b":"value2.6b",
         "key2.7b":"value2.7b",
         "key2.8b":"value2.8b",
         "key2.9b":"value2.9b",
         "key2.10b":{
            "key2.10.1b":"value2.10.1b",
            "key2.10.2b":"value2.10.2b",
            "key2.10.3b":"value2.10.3b",
            "key2.10.4b":{
               "key2.10.4.1b":"value2.10.4.1b"
            }
         },
         "key2.11b":{
            "key2.11.1b":"value2.11.1b",
            "key2.11.2b":"value2.11.2b"
         },
         "key2.12b":"value2.12b",
         "key2.13b":"value2.13b"
      }
      ]
    "key3":"value3"
}

数字代表树的“深度”,字母(“a”和“b”)是单独的记录。

我想要一个带有分层索引列的 DataFrame,看起来或多或少像这样:

现在我尝试对列使用 MultiIndex:

columns = pd.MultiIndex.from_product([["key1", "key2", "key3"], ["key2.1","key2.2","key2.3"]])
df = pd.DataFrame(dict, columns = columns)

但它给了我一个空的 DataFrame。有没有办法为每一列指定一个“路径”?

import pandas as pd
from pandas import DataFrame

nested_dict = {
   "key1":"value1",
   "key2":[
      {
         "key2.1a":"value2.1a",
         "key2.2a":"value2.2a",
         "key2.3a":{
            "keya2.3.1a":"value2.3.1a"
         },     
         "key2.4a":"value2.4a",
         "key2.5a":"value2.5a",
         "key2.6a":"value2.6a",
         "key2.7a":"value2.7a",
         "key2.8a":"value2.8a",
         "key2.9a":"value2.9a",
         "key2.10a":{
            "key2.10.1a":"value2.10.1a",
            "key2.10.2a":"value2.10.2a",
            "key2.10.3a":"value2.10.3a",
            "key2.10.4a":{
               "key2.10.4.1a":"value2.10.4.1a"
            }
         },
         "key2.11a":{
            "key2.11.1a":"value2.11.1a",
            "key2.11.2a":"value2.11.2a"
         },
         "key2.12a":"value2.12a",
         "key2.13a":"value2.13a"
      },
      {
         "key2.1b":"value2.1b",
         "key2.2b":"value2.2b",
         "key2.3b":{
            "keya2.3.1b":"value2.3.1b"
         },     
         "key2.4b":"value2.4b",
         "key2.5b":"value2.5b",
         "key2.6b":"value2.6b",
         "key2.7b":"value2.7b",
         "key2.8b":"value2.8b",
         "key2.9b":"value2.9b",
         "key2.10b":{
            "key2.10.1b":"value2.10.1b",
            "key2.10.2b":"value2.10.2b",
            "key2.10.3b":"value2.10.3b",
            "key2.10.4b":{
               "key2.10.4.1b":"value2.10.4.1b"
            }
         },
         "key2.11b":{
            "key2.11.1b":"value2.11.1b",
            "key2.11.2b":"value2.11.2b"
         },
         "key2.12b":"value2.12b",
         "key2.13b":"value2.13b"
      }
      ],
    "key3":"value3"
} 

pd_dataframe = pd.DataFrame(nested_dict)
print(pd_dataframe)

pd_dataframe.transpose()

我明白了。我以为我必须为嵌套字典的每个分支提供某种路径,但有一种更 pythonic 的方法可以做到这一点:

df = pd.json_normalize(dict, record_path='key2', max_level=4)

这并没有像我一开始想要的那样创建多索引列,而只是其中包含重复值的列。但这是一种可以使用的解决方案。

这段代码怎么样?

输出应如下所示:

import pandas as pd
from pandas import DataFrame

nested_dict = {
   "key1":"value1",
   "key2":[
      {
         "key2.1a":"value2.1a",
         "key2.2a":"value2.2a",
         "key2.3a":{
            "keya2.3.1a":"value2.3.1a"
         },     
         "key2.4a":"value2.4a",
         "key2.5a":"value2.5a",
         "key2.6a":"value2.6a",
         "key2.7a":"value2.7a",
         "key2.8a":"value2.8a",
         "key2.9a":"value2.9a",
         "key2.10a":{
            "key2.10.1a":"value2.10.1a",
            "key2.10.2a":"value2.10.2a",
            "key2.10.3a":"value2.10.3a",
            "key2.10.4a":{
               "key2.10.4.1a":"value2.10.4.1a"
            }
         },
         "key2.11a":{
            "key2.11.1a":"value2.11.1a",
            "key2.11.2a":"value2.11.2a"
         },
         "key2.12a":"value2.12a",
         "key2.13a":"value2.13a"
      },
      {
         "key2.1b":"value2.1b",
         "key2.2b":"value2.2b",
         "key2.3b":{
            "keya2.3.1b":"value2.3.1b"
         },     
         "key2.4b":"value2.4b",
         "key2.5b":"value2.5b",
         "key2.6b":"value2.6b",
         "key2.7b":"value2.7b",
         "key2.8b":"value2.8b",
         "key2.9b":"value2.9b",
         "key2.10b":{
            "key2.10.1b":"value2.10.1b",
            "key2.10.2b":"value2.10.2b",
            "key2.10.3b":"value2.10.3b",
            "key2.10.4b":{
               "key2.10.4.1b":"value2.10.4.1b"
            }
         },
         "key2.11b":{
            "key2.11.1b":"value2.11.1b",
            "key2.11.2b":"value2.11.2b"
         },
         "key2.12b":"value2.12b",
         "key2.13b":"value2.13b"
      }
      ],
    "key3":"value3"
} 

pd_dataframe = pd.DataFrame(nested_dict)
print(pd_dataframe)

reform = {(outerKey, innerKey): values for outerKey, innerDict in pd_dataframe.iteritems() for innerKey, values in innerDict.iteritems()}
reform

pd.DataFrame(reform)

pd.DataFrame(reform).T