如何使用自连接重塑 Pandas 中的重复行?
How to use self join to reshape duplicate rows in Pandas?
我有重复的行,我想将它们连接起来。他们就像:
ID Col1 Col2 Col3 ... Col46
-----------------------------------
id1 a1 b1 c1 ... x1
id2 a2 b2 c2 ... x2
id1 a1 b1 c1 ... y1
id3 a3 b3 c3 ... x3
id3 a3 b3 c3 ... y3
id3 a3 b3 c3 ... z3
我想做的是:
ID Col1 Col2 Col3 ... Col46 Col47 Col48
----------------------------------------------------
id1 a1 b1 c1 ... x1 y1 None
id2 a2 b2 c2 ... d2 None None
id3 a3 b3 c3 ... x3 y3 z3
为此,我正在使用合并:
data_cliq = self.cliq.copy()
self.cliq = pd.merge(self.cliq, data_cliq, on = 'ID', how = 'inner')
但我认为我需要比这更复杂的东西,因为它不会给我想要的结果。
我认为您需要先通过 cumcount
and then use pivot_table
创建 groups
:
df['g'] = df.groupby('ID')['Col46'].cumcount()
df = df.pivot_table(index=['ID','Col1','Col2','Col3'],
columns='g',
values='Col46',
aggfunc=''.join).reset_index()
print (df)
g ID Col1 Col2 Col3 0 1 2
0 id1 a1 b1 c1 x1 y1 None
1 id2 a2 b2 c2 x2 None None
2 id3 a3 b3 c3 x3 y3 z3
如果需要重命名列名:
df['g'] = 'Col' + (df.groupby('ID')['Col46'].cumcount() + 46).astype(str)
df = df.pivot_table(index=['ID','Col1','Col2','Col3'],
columns='g',
values='Col46',
aggfunc=''.join).reset_index()
print (df)
g ID Col1 Col2 Col3 Col46 Col47 Col48
0 id1 a1 b1 c1 x1 y1 None
1 id2 a2 b2 c2 x2 None None
2 id3 a3 b3 c3 x3 y3 z3
我有重复的行,我想将它们连接起来。他们就像:
ID Col1 Col2 Col3 ... Col46
-----------------------------------
id1 a1 b1 c1 ... x1
id2 a2 b2 c2 ... x2
id1 a1 b1 c1 ... y1
id3 a3 b3 c3 ... x3
id3 a3 b3 c3 ... y3
id3 a3 b3 c3 ... z3
我想做的是:
ID Col1 Col2 Col3 ... Col46 Col47 Col48
----------------------------------------------------
id1 a1 b1 c1 ... x1 y1 None
id2 a2 b2 c2 ... d2 None None
id3 a3 b3 c3 ... x3 y3 z3
为此,我正在使用合并:
data_cliq = self.cliq.copy()
self.cliq = pd.merge(self.cliq, data_cliq, on = 'ID', how = 'inner')
但我认为我需要比这更复杂的东西,因为它不会给我想要的结果。
我认为您需要先通过 cumcount
and then use pivot_table
创建 groups
:
df['g'] = df.groupby('ID')['Col46'].cumcount()
df = df.pivot_table(index=['ID','Col1','Col2','Col3'],
columns='g',
values='Col46',
aggfunc=''.join).reset_index()
print (df)
g ID Col1 Col2 Col3 0 1 2
0 id1 a1 b1 c1 x1 y1 None
1 id2 a2 b2 c2 x2 None None
2 id3 a3 b3 c3 x3 y3 z3
如果需要重命名列名:
df['g'] = 'Col' + (df.groupby('ID')['Col46'].cumcount() + 46).astype(str)
df = df.pivot_table(index=['ID','Col1','Col2','Col3'],
columns='g',
values='Col46',
aggfunc=''.join).reset_index()
print (df)
g ID Col1 Col2 Col3 Col46 Col47 Col48
0 id1 a1 b1 c1 x1 y1 None
1 id2 a2 b2 c2 x2 None None
2 id3 a3 b3 c3 x3 y3 z3