Pandas:旋转重复值

Pandas: Pivoting With Duplicate Values

我想旋转一个在一列中具有重复值的数据框,以在新列中显示关联值,如下例所示。从 Pandas 文档中我无法弄清楚如何从这里开始...

name   car    model
rob    mazda  626
rob    bmw    328
james  audi   a4
james  VW     golf
tom    audi   a6
tom    ford   focus

为此...

name   car_1  model_1  car_2  model_2
rob    mazda  626      bmw    328
james  audi   a4       VW     golf
tom    audi   a6       ford   focus
x = df.groupby('name')['car','model'] \
      .apply(lambda x: pd.DataFrame(x.values.tolist(),
             columns=['car','model'])) \
      .unstack()
x.columns = ['{0[0]}_{0[1]}'.format(tup) for tup in x.columns]

结果:

In [152]: x
Out[152]:
       car_0 car_1 model_0 model_1
name
james   audi    VW      a4    golf
rob    mazda   bmw     626     328
tom     audi  ford      a6   focus

如何对列进行排序:

In [157]: x.loc[:, x.columns.str[::-1].sort_values().str[::-1]]
Out[157]:
      model_0  car_0 model_1 car_1
name
james      a4   audi    golf    VW
rob       626  mazda     328   bmw
tom        a6   audi   focus  ford

我们可以使用groupbycumcount

设置索引
i = df.groupby('name').cumcount() + 1
df.set_index(['name', i2]).unstack()

         car       model       
           1     2     1      2
name                           
james   audi    VW    a4   golf
rob    mazda   bmw   626    328
tom     audi  ford    a6  focus

或者我们可以折叠 pd.MultiIndex

i = df.groupby('name').cumcount() + 1
d1 = df.set_index(['name', i2]).unstack().sort_index(1, 1)
d1.columns = d1.columns.to_series().map('{0[0]}_{0[1]}'.format)
d1


       car_1 model_1 car_2 model_2
name                              
james   audi      a4    VW    golf
rob    mazda     626   bmw     328
tom     audi      a6  ford   focus