将字典列表的列表转换为 Pandas DataFrame
Converting a list of lists of dictionaries to a Pandas DataFrame
处理字典列表的数据结构的最佳方式可能是什么,就像我正在使用的那样:
[[{'name': 'Export A Smooth'},
{'filter': 'unfiltered'},
{'number of cigarette': 25},
{'nicotine content': 10.5},
{'tar content': 15.0},
{'menthol': False},
{'king size': False},
{'price': 18.99},
{'units sold per week': 50},
{'profits per week': 949.50}],
[{'name': 'Export A Medium'},
{'filter': 'white'},
{'number of cigarette': 25},
{'nicotine content': 10.0},
{'tar content': 12.0},
{'menthol': False},
{'king size': False},
{'price': 18.99},
{'units sold per week': 39},
{'profits per week': 740.61}],
[{'name': 'Canadian Classics Select'},
{'filter': 'brown'},
{'number of cigarette': 25},
{'nicotine content': 11.1},
{'tar content': 11.0},
{'menthol': True},
{'king size': True},
{'price': 19.09},
{'units sold per week': 38},
{'profits per week': 725.42}]]
并将其转换为结构化 table 格式:
name
Filter
Number of Cigarettes
Export A Smooth
unfiltered
25
Export A Medium
white
25
Canadian Classics Select
brown
20
我尝试了几种不同的方法来获得正确的 table 格式并且 table 格式是正确的 但是有很多 NaN
values 除了第一个(出口平滑)之外的所有香烟弹出。
unit
name
filter
profits per week
1
Export A Smooth
NaN
... 900
NaN
2
NaN
unfiltered
...
NaN
3
NaN
NaN
...
NaN
4
NaN
NaN
...
NaN
5
NaN
NaN
...
NaN
.. ...
...
...
...
155
NaN
NaN
...
NaN
156
NaN
NaN
...
NaN
157
NaN
NaN
...
NaN
158
NaN
NaN
...
NaN
159
NaN
NaN
...
447.72
我已经尝试了 pd.DataFrame(cig_list).stack().apply(pd.Series)
和 pd.concat([pd.DataFrame(ii) for ii in cigarettes])
以及遍历 cigs 并尝试以这种方式将它们传递到 DataFrame 中。
cig_list_items = []
for items in cig_list:
for _ in items:
cig_list_items.append(_)
pd.DataFrame(cig_list_items)
它们都 return 相同的结果所以我认为字典的格式一定有问题?我怀疑字典需要重新排列,以便它们阅读起来更像这样:
[[{'name': 'Export A Smooth'},
{'name': 'Export A Medium'}
{'name': 'Pall Mall Bold'}],
[{'filter': 'unfiltered'},
{'filter': 'white'}
{'filter': 'regular'}]]
让我们假设您的列表列表在 lst
变量中,然后试试这个:
flat_list = [item for sublist in t for item in lst]
df = pd.json_normalize(flat_list)
首先,它将列表的列表扁平化为一个列表,其中每个项目都是一个字典。然后将整个东西转换成 pandas 数据帧。
由于每个条目都是一个单独的字典,您可以使用列表+字典理解加入它们:
df = pd.DataFrame([{k: v for d in i for k, v in d.items()} for i in l])
print (df)
name filter number of cigarette nicotine content tar content menthol king size price units sold per week profits per week
0 Export A Smooth unfiltered 25 10.5 15.0 False False 18.99 50 949.50
1 Export A Medium white 25 10.0 12.0 False False 18.99 39 740.61
2 Canadian Classics Select brown 25 11.1 11.0 True True 19.09 38 725.42
如果您发现理解难以阅读,请看这里的内容:
newlist=[]
for i in data:
newdict={}
for j in i:
for key,item in j.items():
new_dict[key]=item
newlist.append(new_dict)
df = pd.DataFrame(newlist)
处理字典列表的数据结构的最佳方式可能是什么,就像我正在使用的那样:
[[{'name': 'Export A Smooth'},
{'filter': 'unfiltered'},
{'number of cigarette': 25},
{'nicotine content': 10.5},
{'tar content': 15.0},
{'menthol': False},
{'king size': False},
{'price': 18.99},
{'units sold per week': 50},
{'profits per week': 949.50}],
[{'name': 'Export A Medium'},
{'filter': 'white'},
{'number of cigarette': 25},
{'nicotine content': 10.0},
{'tar content': 12.0},
{'menthol': False},
{'king size': False},
{'price': 18.99},
{'units sold per week': 39},
{'profits per week': 740.61}],
[{'name': 'Canadian Classics Select'},
{'filter': 'brown'},
{'number of cigarette': 25},
{'nicotine content': 11.1},
{'tar content': 11.0},
{'menthol': True},
{'king size': True},
{'price': 19.09},
{'units sold per week': 38},
{'profits per week': 725.42}]]
并将其转换为结构化 table 格式:
name | Filter | Number of Cigarettes |
---|---|---|
Export A Smooth | unfiltered | 25 |
Export A Medium | white | 25 |
Canadian Classics Select | brown | 20 |
我尝试了几种不同的方法来获得正确的 table 格式并且 table 格式是正确的 但是有很多 NaN
values 除了第一个(出口平滑)之外的所有香烟弹出。
unit | name | filter | profits per week | |
---|---|---|---|---|
1 | Export A Smooth | NaN | ... 900 | NaN |
2 | NaN | unfiltered | ... | NaN |
3 | NaN | NaN | ... | NaN |
4 | NaN | NaN | ... | NaN |
5 | NaN | NaN | ... | NaN |
.. ... | ... | ... | ... | |
155 | NaN | NaN | ... | NaN |
156 | NaN | NaN | ... | NaN |
157 | NaN | NaN | ... | NaN |
158 | NaN | NaN | ... | NaN |
159 | NaN | NaN | ... | 447.72 |
我已经尝试了 pd.DataFrame(cig_list).stack().apply(pd.Series)
和 pd.concat([pd.DataFrame(ii) for ii in cigarettes])
以及遍历 cigs 并尝试以这种方式将它们传递到 DataFrame 中。
cig_list_items = []
for items in cig_list:
for _ in items:
cig_list_items.append(_)
pd.DataFrame(cig_list_items)
它们都 return 相同的结果所以我认为字典的格式一定有问题?我怀疑字典需要重新排列,以便它们阅读起来更像这样:
[[{'name': 'Export A Smooth'},
{'name': 'Export A Medium'}
{'name': 'Pall Mall Bold'}],
[{'filter': 'unfiltered'},
{'filter': 'white'}
{'filter': 'regular'}]]
让我们假设您的列表列表在 lst
变量中,然后试试这个:
flat_list = [item for sublist in t for item in lst]
df = pd.json_normalize(flat_list)
首先,它将列表的列表扁平化为一个列表,其中每个项目都是一个字典。然后将整个东西转换成 pandas 数据帧。
由于每个条目都是一个单独的字典,您可以使用列表+字典理解加入它们:
df = pd.DataFrame([{k: v for d in i for k, v in d.items()} for i in l])
print (df)
name filter number of cigarette nicotine content tar content menthol king size price units sold per week profits per week
0 Export A Smooth unfiltered 25 10.5 15.0 False False 18.99 50 949.50
1 Export A Medium white 25 10.0 12.0 False False 18.99 39 740.61
2 Canadian Classics Select brown 25 11.1 11.0 True True 19.09 38 725.42
如果您发现理解难以阅读,请看这里的内容:
newlist=[]
for i in data:
newdict={}
for j in i:
for key,item in j.items():
new_dict[key]=item
newlist.append(new_dict)
df = pd.DataFrame(newlist)