Python : Pandas pivot table 一次用于多个具有重复值的列
Python : Pandas pivot table for multiple columns at once which has duplicate values
有一个 pandas 数据框,其中包含列 name 、 school 和 marks
name school marks
tom HBS 55
tom HBS 55
tom HBS 14
mark HBS 28
mark HBS 19
lewis HBS 88
如何转置转换成这样
name school marks_1 marks_2 marks_3
tom HBS 55 55 14
mark HBS 28 19
lewis HBS 88
试过这个:
df = df.pivot_table(index='name', values='marks', columns='school') \
.reset_index() \
.rename_axis(None, axis=1)
print(df)
df = df.pivot('name','marks','school')
检查了这些链接
由于重复值而出现此错误。如果存在重复且我们必须保留它们,如何处理
ValueError: Index contains duplicate entries, cannot reshape
尝试将 set_index
和 unstack
与 groupby
和 cumcount
一起使用:
df_out = df.set_index(['name',
'school',
df.groupby(['name','school'])\
.cumcount() +1]).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.reset_index()
df_out
输出:
name school marks_1 marks_2 marks_3
0 lewis HBS 88.0 NaN NaN
1 mark HBS 28.0 19.0 NaN
2 tom HBS 55.0 55.0 14.0
cumcount
函数允许在旋转之前创建唯一索引。这建立在与@ScottBoston 相同的想法之上;但是,这里使用了 pivot
函数:
index = ['name', 'school']
# create an extra column for uniqueness
temp = (df.assign(counter = df.groupby(index)
.cumcount()
.add(1)
.astype(str))
.pivot(index = index, columns = 'counter')
)
# flatten the columns
temp.columns = temp.columns.map('_'.join)
temp.reset_index()
name school marks_1 marks_2 marks_3
0 lewis HBS 88.0 NaN NaN
1 mark HBS 28.0 19.0 NaN
2 tom HBS 55.0 55.0 14.0
或者,您可以使用 pivot_wider function from pyjanitor,它是 pd.pivot
周围的语法糖,还有一些助手:
# pip install pyjanitor
import pandas as pd
import janitor
(df.assign(counter = df.groupby(index)
.cumcount()
.add(1))
.pivot_wider(index = index,
names_from = 'counter',
names_sep = '_')
)
name school marks_1 marks_2 marks_3
0 lewis HBS 88.0 NaN NaN
1 mark HBS 28.0 19.0 NaN
2 tom HBS 55.0 55.0 14.0
有一个 pandas 数据框,其中包含列 name 、 school 和 marks
name school marks
tom HBS 55
tom HBS 55
tom HBS 14
mark HBS 28
mark HBS 19
lewis HBS 88
如何转置转换成这样
name school marks_1 marks_2 marks_3
tom HBS 55 55 14
mark HBS 28 19
lewis HBS 88
试过这个:
df = df.pivot_table(index='name', values='marks', columns='school') \
.reset_index() \
.rename_axis(None, axis=1)
print(df)
df = df.pivot('name','marks','school')
检查了这些链接
由于重复值而出现此错误。如果存在重复且我们必须保留它们,如何处理
ValueError: Index contains duplicate entries, cannot reshape
尝试将 set_index
和 unstack
与 groupby
和 cumcount
一起使用:
df_out = df.set_index(['name',
'school',
df.groupby(['name','school'])\
.cumcount() +1]).unstack()
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
df_out = df_out.reset_index()
df_out
输出:
name school marks_1 marks_2 marks_3
0 lewis HBS 88.0 NaN NaN
1 mark HBS 28.0 19.0 NaN
2 tom HBS 55.0 55.0 14.0
cumcount
函数允许在旋转之前创建唯一索引。这建立在与@ScottBoston 相同的想法之上;但是,这里使用了 pivot
函数:
index = ['name', 'school']
# create an extra column for uniqueness
temp = (df.assign(counter = df.groupby(index)
.cumcount()
.add(1)
.astype(str))
.pivot(index = index, columns = 'counter')
)
# flatten the columns
temp.columns = temp.columns.map('_'.join)
temp.reset_index()
name school marks_1 marks_2 marks_3
0 lewis HBS 88.0 NaN NaN
1 mark HBS 28.0 19.0 NaN
2 tom HBS 55.0 55.0 14.0
或者,您可以使用 pivot_wider function from pyjanitor,它是 pd.pivot
周围的语法糖,还有一些助手:
# pip install pyjanitor
import pandas as pd
import janitor
(df.assign(counter = df.groupby(index)
.cumcount()
.add(1))
.pivot_wider(index = index,
names_from = 'counter',
names_sep = '_')
)
name school marks_1 marks_2 marks_3
0 lewis HBS 88.0 NaN NaN
1 mark HBS 28.0 19.0 NaN
2 tom HBS 55.0 55.0 14.0