如何使用 pandas 创建具有有限数量新列的数据透视表 table？

Question

例如我有这个数据框：

emp_id	标签
a1	101
a1	102
a1	103
a1	104
a2	420
a2	17
a3	99
a3	100
a3	101

我想创建一个新的数据框，就像一个数据框 table，其中的列数是固定的，任何超过该限制的行都会被删除。

例如，如果我的列限制是 3，我想要的输出数据帧将是：

emp_id	label_1	label_2	label_3
a1	101	102	103
a2	420	17	南
a3	99	100	101

a1 的第 4 个条目被删除，a2 的第 3 个条目是 NaN。

我怎样才能达到这个结果？

Answer 1

您可以分组 emp_id 并使用 cumcount：

df['group'] = df.groupby('emp_id').cumcount().add(1)

现在您应该有一个新列，其中包含要用作数据透视表的名称。您可以只保留您想要的标签并执行数据透视。

类似于：

df = df[df['group'].le(3)]
df.pivot(index='emp_id', columns='group', values='label')
df.columns = df.columns.astype(str).add_prefix('label_')
df = df.reset_index()

注意。我无法测试代码

Answer 2

类似的方法，我们可以使用groupby cumcount to enumerate emp_id, and use a pivot_table to go to wide format, but filter groups based on the specified limit. Then use Index.map来格式化列headers:

# Set limit
limit = 3
# Create Groups
groups = df.groupby('emp_id').cumcount() + 1
# Pivot to wide format with new columns
df = df.pivot_table(index='emp_id',
                    columns=groups[groups.le(limit)],  #  Limit groups
                    values='label')
# Update Column Labels
df.columns = df.columns.map('label_{:.0f}'.format)
# Reset Index
df = df.reset_index()

df:

  emp_id  label_1  label_2  label_3
0     a1    101.0    102.0    103.0
1     a2    420.0     17.0      NaN
2     a3     99.0    100.0    101.0

DataFrame 和导入：

import pandas as pd

df = pd.DataFrame({
    'emp_id': ['a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a3', 'a3', 'a3'],
    'label': [101, 102, 103, 104, 420, 17, 99, 100, 101]
})

如何使用 pandas 创建具有有限数量新列的数据透视表 table？

How do you create a pivot table with a limited number of new columns using pandas?

python

pandas

dataframe

pivot-table