使用 groupby pandas 仅聚合一个重复值
Aggregate only one of the duplicated values with groupby pandas
我有以下数据,最后一列是所需的输出:
activity
teacher
group
students
the desired column
One
A
a
3
5
One
B
b
2
5
two
A
c
7
7
One
D
a
3
5
two
C
c
7
7
我想按 activity 分组,当我们有多个老师时,返回学生人数而不重复学生。
我尝试了以下但它重复了同一组的总和。
df.groupby('activity').students.transform('sum')
输出如下:
activity
teacher
group
students
the output column
One
A
a
3
8
One
B
b
2
8
two
A
c
7
14
One
A
a
3
8
two
C
c
7
14
提前感谢您的任何建议。
IIUC:
x = (
df.drop_duplicates(subset=["activity", "group"])
.groupby("activity")["students"]
.sum()
)
df["the desired column"] = df["activity"].map(x)
print(df)
打印:
activity teacher group students the desired column
0 One A a 3 5
1 One B b 2 5
2 two A c 7 7
3 One D a 3 5
4 two C c 7 7
我有以下数据,最后一列是所需的输出:
activity | teacher | group | students | the desired column |
---|---|---|---|---|
One | A | a | 3 | 5 |
One | B | b | 2 | 5 |
two | A | c | 7 | 7 |
One | D | a | 3 | 5 |
two | C | c | 7 | 7 |
我想按 activity 分组,当我们有多个老师时,返回学生人数而不重复学生。 我尝试了以下但它重复了同一组的总和。
df.groupby('activity').students.transform('sum')
输出如下:
activity | teacher | group | students | the output column |
---|---|---|---|---|
One | A | a | 3 | 8 |
One | B | b | 2 | 8 |
two | A | c | 7 | 14 |
One | A | a | 3 | 8 |
two | C | c | 7 | 14 |
提前感谢您的任何建议。
IIUC:
x = (
df.drop_duplicates(subset=["activity", "group"])
.groupby("activity")["students"]
.sum()
)
df["the desired column"] = df["activity"].map(x)
print(df)
打印:
activity teacher group students the desired column
0 One A a 3 5
1 One B b 2 5
2 two A c 7 7
3 One D a 3 5
4 two C c 7 7