我如何解压嵌套字典，其中并非每个顶级键都具有所有二级键？

Question

如果不是每个顶级键都具有所有二级键，我如何解压嵌套字典？

我从网站上抓取了有关属性的数据。该网站为每个属性提供最多 7 个属性，但属性类型因属性而异（即 "Land" 属性类型不显示 "Building Size" 作为属性，因为有没有建筑物）。

作为解决这个问题的第一步，我将属性类型和值都抓取为单独的列，并将数据转换为字典形式，其中每个属性都有一个唯一的 ID_Number 和一系列 key:value 对。现在我想将该字典解压到一个数据框中，其中列 headers 代表所有可能的二级键（属性类型），列值将是与属性键关联的 "value"。

数据示例如下：

{1: [{'Status:': 'For Lease',
   'Price:': '.17 SF/Mo',
   'Property Type:': 'Retail',
   'Sub-Type:': 'Office, Retail',
   'Spaces:': '2 Spaces',
   'Space Available:': '0.00 - 0.03 AC',
   'Building Size:': '9,161 SF'}],
 2: [{'Status:': 'For Lease',
   'Price:': '.25 SF/Mo',
   'Property Type:': 'Office',
   'Sub-Type:': 'Office',
   'Spaces:': '1 Space',
   'Space Available:': '0.03 AC',
   'Building Size:': '11,332 SF'}],
 3: [{'Status:': 'For Sale',
   'Price:': 2521740,
   'Property Type:': 'Retail',
   'Sub-Type:': 'Fast Food',
   'Building Size:': '2,410 SF',
   'Cap Rate:': 0.0575,
   'Lot Size:': '76,666 SF'}],
 4: [{'Status:': 'For Lease',
   'Price:': '[=10=].63 SF/Mo',
   'Property Type:': 'Retail',
   'Sub-Type:': 'Retail',
   'Spaces:': '1 Space',
   'Space Available:': '0.50 AC',
   'Building Size:': '59,095 SF'}],

我该如何提取它？我在 from_dict 上尝试了几种变体，但没有找到有效的解决方案。

提前致谢！

Answer 1

有几种方法可以做到这一点。我不是 pandas 专家，所以可能会有更优雅的解决方案。但这是我快速而肮脏的方式（顺便说一句，您提供的示例数据中有 9 个唯一属性，而不是 7 个）。这将通过使它们成为 NaN:

来自动处理缺失值

import pandas as pd

data = {1: [{'Building Size:': '9,161 SF',
              'Price:': '.17 SF/Mo',
              'Property Type:': 'Retail',
              'Space Available:': '0.00 - 0.03 AC',
              'Spaces:': '2 Spaces',
              'Status:': 'For Lease',
              'Sub-Type:': 'Office, Retail'}],
         2: [{'Building Size:': '11,332 SF',
              'Price:': '.25 SF/Mo',
              'Property Type:': 'Office',
              'Space Available:': '0.03 AC',
              'Spaces:': '1 Space',
              'Status:': 'For Lease',
              'Sub-Type:': 'Office'}],
         3: [{'Building Size:': '2,410 SF',
              'Cap Rate:': 0.0575,
              'Lot Size:': '76,666 SF',
              'Price:': 2521740,
              'Property Type:': 'Retail',
              'Status:': 'For Sale',
              'Sub-Type:': 'Fast Food'}],
         4: [{'Building Size:': '59,095 SF',
              'Price:': '[=10=].63 SF/Mo',
              'Property Type:': 'Retail',
              'Space Available:': '0.50 AC',
              'Spaces:': '1 Space',
              'Status:': 'For Lease',
              'Sub-Type:': 'Retail'}],
        }

df = pd.DataFrame()
for property_num, property_list in data.items():
    for property_dict in property_list:  # you only have one per list, so this isn't really needed
        df = df.append(property_dict, True)
df.index = data.keys()



>>> print(df)
  Building Size:       Price:  ... Cap Rate:  Lot Size:
1       9,161 SF  .17 SF/Mo  ...       NaN        NaN
2      11,332 SF  .25 SF/Mo  ...       NaN        NaN
3       2,410 SF      2521740  ...    0.0575  76,666 SF
4      59,095 SF  [=10=].63 SF/Mo  ...       NaN        NaN

[4 rows x 9 columns]

我如何解压嵌套字典，其中并非每个顶级键都具有所有二级键？

How can I unpack a nested dictionary where not every top level key has all of the second level keys?

python

dictionary

unpack

pandas