使用 groupby pandas 仅聚合一个重复值

Question

我有以下数据，最后一列是所需的输出：

activity	teacher	group	students	the desired column
One	A	a	3	5
One	B	b	2	5
two	A	c	7	7
One	D	a	3	5
two	C	c	7	7

我想按 activity 分组，当我们有多个老师时，返回学生人数而不重复学生。我尝试了以下但它重复了同一组的总和。

df.groupby('activity').students.transform('sum')

输出如下：

activity	teacher	group	students	the output column
One	A	a	3	8
One	B	b	2	8
two	A	c	7	14
One	A	a	3	8
two	C	c	7	14

提前感谢您的任何建议。

Answer 1

IIUC:

x = (
    df.drop_duplicates(subset=["activity", "group"])
    .groupby("activity")["students"]
    .sum()
)
df["the desired column"] = df["activity"].map(x)
print(df)

打印：

  activity teacher group  students  the desired column
0      One       A     a         3                   5
1      One       B     b         2                   5
2      two       A     c         7                   7
3      One       D     a         3                   5
4      two       C     c         7                   7

使用 groupby pandas 仅聚合一个重复值

Aggregate only one of the duplicated values with groupby pandas

python

transform

pandas

pandas-groupby