如何使用自连接重塑 Pandas 中的重复行？

Question

我有重复的行，我想将它们连接起来。他们就像：

ID    Col1   Col2   Col3  ... Col46
-----------------------------------
id1    a1     b1     c1   ...  x1
id2    a2     b2     c2   ...  x2
id1    a1     b1     c1   ...  y1
id3    a3     b3     c3   ...  x3
id3    a3     b3     c3   ...  y3
id3    a3     b3     c3   ...  z3

我想做的是：

ID    Col1   Col2   Col3  ...  Col46   Col47   Col48
----------------------------------------------------
id1    a1     b1     c1   ...   x1      y1      None
id2    a2     b2     c2   ...   d2     None     None
id3    a3     b3     c3   ...   x3      y3       z3

为此，我正在使用合并：

  data_cliq = self.cliq.copy()
  self.cliq = pd.merge(self.cliq, data_cliq, on = 'ID', how = 'inner')

但我认为我需要比这更复杂的东西，因为它不会给我想要的结果。

Answer 1

我认为您需要先通过 cumcount and then use pivot_table 创建 groups:

df['g'] = df.groupby('ID')['Col46'].cumcount()

df = df.pivot_table(index=['ID','Col1','Col2','Col3'], 
                    columns='g', 
                    values='Col46', 
                    aggfunc=''.join).reset_index()

print (df)

g   ID Col1 Col2 Col3   0     1     2
0  id1   a1   b1   c1  x1    y1  None
1  id2   a2   b2   c2  x2  None  None
2  id3   a3   b3   c3  x3    y3    z3

如果需要重命名列名：

df['g'] = 'Col' + (df.groupby('ID')['Col46'].cumcount() + 46).astype(str)

df = df.pivot_table(index=['ID','Col1','Col2','Col3'], 
                    columns='g', 
                    values='Col46', 
                    aggfunc=''.join).reset_index()

print (df)
g   ID Col1 Col2 Col3 Col46 Col47 Col48
0  id1   a1   b1   c1    x1    y1  None
1  id2   a2   b2   c2    x2  None  None
2  id3   a3   b3   c3    x3    y3    z3

如何使用自连接重塑 Pandas 中的重复行？

How to use self join to reshape duplicate rows in Pandas?

python

merge

self-join

reshape

pandas