以重复项作为新行的 pivot df
pivot df with duplicates as new rows
晚上,我有一个要重塑的数据框。某些列有重复的 id 变量,我希望重复的值显示为新行
我的数据如下所示,我希望将 ID 作为行,将组作为列,将选项作为值。如果在一个组中为每个 id 选择了多个选项,则应如下所示复制该行。当我使用 pivot 时,我最终只会得到组合值的平均值或总和,例如11.5 为 id i1,group1。非常欢迎所有提示谢谢
import pandas as pd
import numpy as np
df = pd.DataFrame({'id': ['i1','i1','i1','i2','i2','i2','i2','i2','i3','i3'],
'group': ['group1','group1','group2','group3','group1','group2','group2','group3','group1','group2'],
'choice':[12,11,12,14,11,19,9,7,8,9]})
pd.DataFrame({'id': ['i1','i1','i2','i2','i3'],
'group1': [12,11,11,np.nan,8],
'group2': [12,np.nan,19,9,9],
'group3':[np.nan,np.nan,14,7,np.nan]})
使用GroupBy.cumcount
with Series.unstack
and DataFrame.droplevel
:
g = df.groupby(['id','group']).cumcount().add(1)
df = (df.set_index(['id','group', g])['choice']
.unstack(level=1)
.droplevel(level=1)
.rename_axis(None,axis=1)
.reset_index())
print (df)
id group1 group2 group3
0 i1 12.0 12.0 NaN
1 i1 11.0 NaN NaN
2 i2 11.0 19.0 14.0
3 i2 NaN 9.0 7.0
4 i3 8.0 9.0 NaN
晚上,我有一个要重塑的数据框。某些列有重复的 id 变量,我希望重复的值显示为新行
我的数据如下所示,我希望将 ID 作为行,将组作为列,将选项作为值。如果在一个组中为每个 id 选择了多个选项,则应如下所示复制该行。当我使用 pivot 时,我最终只会得到组合值的平均值或总和,例如11.5 为 id i1,group1。非常欢迎所有提示谢谢
import pandas as pd
import numpy as np
df = pd.DataFrame({'id': ['i1','i1','i1','i2','i2','i2','i2','i2','i3','i3'],
'group': ['group1','group1','group2','group3','group1','group2','group2','group3','group1','group2'],
'choice':[12,11,12,14,11,19,9,7,8,9]})
pd.DataFrame({'id': ['i1','i1','i2','i2','i3'],
'group1': [12,11,11,np.nan,8],
'group2': [12,np.nan,19,9,9],
'group3':[np.nan,np.nan,14,7,np.nan]})
使用GroupBy.cumcount
with Series.unstack
and DataFrame.droplevel
:
g = df.groupby(['id','group']).cumcount().add(1)
df = (df.set_index(['id','group', g])['choice']
.unstack(level=1)
.droplevel(level=1)
.rename_axis(None,axis=1)
.reset_index())
print (df)
id group1 group2 group3
0 i1 12.0 12.0 NaN
1 i1 11.0 NaN NaN
2 i2 11.0 19.0 14.0
3 i2 NaN 9.0 7.0
4 i3 8.0 9.0 NaN