Pandas：使用重复列将长数据重塑为宽数据

Question

我需要将长 pandas 数据帧转为宽数据帧。问题是对于某些 id，同一个参数有多个值。一些参数只出现在少数 ids 中。

df = pd.DataFrame({'indx':[11,11,11,11,12,12,12,13,13,13,13],'param':['a','b','b','c','a','b','d','a','b','c','c'],'value':[100,54,65,65,789,24,98,24,27,75,35]})

indx param  value
11  a   100
11  b   54
11  b   65
11  c   65
12  a   789
12  b   24
12  d   98
13  a   24
13  b   27
13  c   75
13  c   35

想收到这样的东西：

indx  a    b       c      d
11    100 `54,65`  65     None
12    789  None    98     24
13    24   27     `75,35` None

或

indx a   b    b1    c   c1   d
11  100  54   65    65  None None
12  789  None None 98  None 24
13  24   27   None 75  35    None

所以，显然直接df.pivot()不是解决方案。
感谢您的帮助。

Answer 1

选项 1：

df.astype(str).groupby(['indx', 'param'])['value'].agg(','.join).unstack()

输出：

param    a      b      c    d
indx                         
11     100  54,65     65  NaN
12     789     24    NaN   98
13      24     27  75,35  NaN

选项 2

df_out = df.set_index(['indx', 'param', df.groupby(['indx','param']).cumcount()])['value'].unstack([1,2])
df_out.columns = [f'{i}_{j}' if j != 0 else f'{i}' for i, j in df_out.columns]
df_out.reset_index()

输出：

   indx      a     b   b_1     c     d   c_1
0    11  100.0  54.0  65.0  65.0   NaN   NaN
1    12  789.0  24.0   NaN   NaN  98.0   NaN
2    13   24.0  27.0   NaN  75.0   NaN  35.0

Answer 2

好的，找到了解决办法（有方法df.pivot_table针对这种情况，允许不同的聚合函数）：

df.pivot_table(index='indx', columns='param',values='value', aggfunc=lambda x: ','.join(x.astype(str)) )

indx  a    b    c    d
11    100 54,65 65   NaN
12    789  24   NaN  98
13    24   27  75,35 NaN

Pandas：使用重复列将长数据重塑为宽数据

Pandas: Reshaping Long Data to Wide with duplicated columns

python

pivot

dataframe

pandas

选项 1：

选项 2