将字典列表的列表转换为 Pandas DataFrame

Converting a list of lists of dictionaries to a Pandas DataFrame

处理字典列表的数据结构的最佳方式可能是什么,就像我正在使用的那样:

       [[{'name': 'Export A Smooth'},
       {'filter': 'unfiltered'},
       {'number of cigarette': 25},
       {'nicotine content': 10.5},
       {'tar content': 15.0},
       {'menthol': False},
       {'king size': False},
       {'price': 18.99},
       {'units sold per week': 50},
       {'profits per week': 949.50}],

      [{'name': 'Export A Medium'},
       {'filter': 'white'},
       {'number of cigarette': 25},
       {'nicotine content': 10.0},
       {'tar content': 12.0},
       {'menthol': False},
       {'king size': False},
       {'price': 18.99},
       {'units sold per week': 39},
       {'profits per week': 740.61}],

      [{'name': 'Canadian Classics Select'},
       {'filter': 'brown'},
       {'number of cigarette': 25},
       {'nicotine content': 11.1},
       {'tar content': 11.0},
       {'menthol': True},
       {'king size': True},
       {'price': 19.09},
       {'units sold per week': 38},
       {'profits per week': 725.42}]]

并将其转换为结构化 table 格式:

name Filter Number of Cigarettes
Export A Smooth unfiltered 25
Export A Medium white 25
Canadian Classics Select brown 20

我尝试了几种不同的方法来获得正确的 table 格式并且 table 格式是正确的 但是有很多 NaN values 除了第一个(出口平滑)之外的所有香烟弹出。

unit name filter profits per week
1 Export A Smooth NaN ... 900 NaN
2 NaN unfiltered ... NaN
3 NaN NaN ... NaN
4 NaN NaN ... NaN
5 NaN NaN ... NaN
.. ... ... ... ...
155 NaN NaN ... NaN
156 NaN NaN ... NaN
157 NaN NaN ... NaN
158 NaN NaN ... NaN
159 NaN NaN ... 447.72

我已经尝试了 pd.DataFrame(cig_list).stack().apply(pd.Series)pd.concat([pd.DataFrame(ii) for ii in cigarettes]) 以及遍历 cigs 并尝试以这种方式将它们传递到 DataFrame 中。

   cig_list_items = []
   for items in cig_list:
   for _ in items:
   cig_list_items.append(_)
   pd.DataFrame(cig_list_items)

它们都 return 相同的结果所以我认为字典的格式一定有问题?我怀疑字典需要重新排列,以便它们阅读起来更像这样:

[[{'name': 'Export A Smooth'},
  {'name': 'Export A Medium'}
  {'name': 'Pall Mall Bold'}],


  [{'filter': 'unfiltered'},
  {'filter': 'white'}
  {'filter': 'regular'}]]

让我们假设您的列表列表在 lst 变量中,然后试试这个:

flat_list = [item for sublist in t for item in lst]

df = pd.json_normalize(flat_list)

首先,它将列表的列表扁平化为一个列表,其中每个项目都是一个字典。然后将整个东西转换成 pandas 数据帧。

由于每个条目都是一个单独的字典,您可以使用列表+字典理解加入它们:

df = pd.DataFrame([{k: v for d in i for k, v in d.items()} for i in l])

print (df)

                       name      filter  number of cigarette  nicotine content  tar content  menthol  king size  price  units sold per week  profits per week
0           Export A Smooth  unfiltered                   25              10.5         15.0    False      False  18.99                   50            949.50
1           Export A Medium       white                   25              10.0         12.0    False      False  18.99                   39            740.61
2  Canadian Classics Select       brown                   25              11.1         11.0     True       True  19.09                   38            725.42

如果您发现理解难以阅读,请看这里的内容:

newlist=[]
for i in data:
     newdict={}
     for j in i:
         for key,item in j.items():
             new_dict[key]=item
     newlist.append(new_dict)
    
df = pd.DataFrame(newlist)