Python : Pandas pivot table 一次用于多个具有重复值的列

Question

有一个 pandas 数据框，其中包含列 name 、 school 和 marks

name  school  marks

tom     HBS     55
tom     HBS     55
tom     HBS     14
mark    HBS     28
mark    HBS     19
lewis   HBS     88

如何转置转换成这样

name  school  marks_1 marks_2 marks_3

tom     HBS     55     55       14
mark    HBS     28     19
lewis   HBS     88

试过这个：

df = df.pivot_table(index='name', values='marks', columns='school') \
    .reset_index() \
    .rename_axis(None, axis=1)

print(df)

df = df.pivot('name','marks','school')

检查了这些链接

由于重复值而出现此错误。如果存在重复且我们必须保留它们，如何处理

ValueError: Index contains duplicate entries, cannot reshape

Answer 1

尝试将 set_index 和 unstack 与 groupby 和 cumcount 一起使用：

df_out = df.set_index(['name',
                       'school',
                       df.groupby(['name','school'])\
           .cumcount() +1]).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.reset_index()
df_out

输出：

    name school  marks_1  marks_2  marks_3
0  lewis    HBS     88.0      NaN      NaN
1   mark    HBS     28.0     19.0      NaN
2    tom    HBS     55.0     55.0     14.0

Answer 2

cumcount 函数允许在旋转之前创建唯一索引。这建立在与@ScottBoston 相同的想法之上；但是，这里使用了 pivot 函数：

index = ['name', 'school']

                  # create an extra column for uniqueness          
temp = (df.assign(counter = df.groupby(index)
                              .cumcount()
                              .add(1)
                              .astype(str))
          .pivot(index = index, columns = 'counter')
        )

# flatten the columns
temp.columns = temp.columns.map('_'.join)

temp.reset_index()

    name school  marks_1  marks_2  marks_3
0  lewis    HBS     88.0      NaN      NaN
1   mark    HBS     28.0     19.0      NaN
2    tom    HBS     55.0     55.0     14.0

或者，您可以使用 pivot_wider function from pyjanitor，它是 pd.pivot 周围的语法糖，还有一些助手：

# pip install pyjanitor
import pandas as pd
import janitor
(df.assign(counter = df.groupby(index)
                       .cumcount()
                       .add(1))                              
   .pivot_wider(index = index, 
                names_from = 'counter', 
                names_sep = '_')
)

    name school  marks_1  marks_2  marks_3
0  lewis    HBS     88.0      NaN      NaN
1   mark    HBS     28.0     19.0      NaN
2    tom    HBS     55.0     55.0     14.0

Python : Pandas pivot table 一次用于多个具有重复值的列

Python : Pandas pivot table for multiple columns at once which has duplicate values

python

pivot

group-by

dataframe

pandas