如何通过不同的字典扩展 df 作为列?

How to expand a df by different dict as columns?

我有一个 df,其中 不同的 指令作为列中的条目,在我的例子中是列 "information"。我想通过所有可能的 dict.keys() 扩展 df,像这样:

import pandas as pd
import numpy as np
df = pd.DataFrame({'id': pd.Series([1, 2, 3, 4, 5]),
                   'name': pd.Series(['banana',
                                      'apple',
                                      'orange',
                                      'strawberry' ,
                                      'toast']),
                   'information': pd.Series([{'shape':'curve','color':'yellow'},
                                             {'color':'red'},
                                             {'shape':'round'},
                                             {'amount':500},
                                             np.nan]),
                   'cost': pd.Series([1,2,2,10,4])})


   id        name                            information  cost
0   1      banana  {'shape': 'curve', 'color': 'yellow'}     1
1   2       apple                       {'color': 'red'}     2
2   3      orange                     {'shape': 'round'}     2
3   4  strawberry                        {'amount': 500}    10
4   5       toast                                    NaN     4

应该是这样的:

   id        name  shape   color  amount  cost
0   1      banana  curve  yellow     NaN     1
1   2       apple    NaN     red     NaN     2
2   3      orange  round     NaN     NaN     2
3   4  strawberry    NaN     NaN   500.0    10
4   5       toast    NaN     NaN     NaN     4

您可以使用:

d = {k: {} if v != v else v for k, v in df.pop('information').items()}
df1 = pd.DataFrame.from_dict(d, orient='index')
df = pd.concat([df, df1], axis=1)
print(df)
   id        name  cost  shape   color  amount
0   1      banana     1  curve  yellow     NaN
1   2       apple     2    NaN     red     NaN
2   3      orange     2  round     NaN     NaN
3   4  strawberry    10    NaN     NaN   500.0
4   5       toast     4    NaN     NaN     NaN

另一种方法是使用 pandas.DataFrame.from_records:

import pandas as pd

new = pd.DataFrame.from_records(df.pop('information').apply(lambda x: {} if pd.isna(x) else x))
new = pd.concat([df, new], 1)
print(new)

输出:

   cost  id        name  amount   color  shape
0     1   1      banana     NaN  yellow  curve
1     2   2       apple     NaN     red    NaN
2     2   3      orange     NaN     NaN  round
3    10   4  strawberry   500.0     NaN    NaN
4     4   5       toast     NaN     NaN    NaN