根据数据框中的其他列在 pandas 中添加具有多个条件的列
Adding column in pandas with several conditions based on other columns in dataframe
首先,如果这已经在 Whosebug 的某处,我深表歉意,我自己试验了一个小时后又搜索了一个小时,但没有找到。我确信一定有一个优雅的(可能是基本的)解决方案。
我有以下数据框:
Admit Gender Dept Freq
0 Admitted Male A 512
1 Rejected Male A 313
2 Admitted Female A 89
3 Rejected Female A 19
4 Admitted Male B 353
5 Rejected Male B 207
6 Admitted Female B 17
7 Rejected Female B 8
8 Admitted Male C 120
9 Rejected Male C 205
10 Admitted Female C 202
11 Rejected Female C 391
12 Admitted Male D 138
13 Rejected Male D 279
14 Admitted Female D 131
15 Rejected Female D 244
16 Admitted Male E 53
17 Rejected Male E 138
18 Admitted Female E 94
19 Rejected Female E 299
20 Admitted Male F 22
21 Rejected Male F 351
22 Admitted Female F 24
23 Rejected Female F 317
我想添加一个列'Proportion',它给出了每个部门按性别划分的成功/失败申请人的比例。
这样:
df.loc[0, 'Proportion'] = 512/(512+313) = 0.6206
df.loc[1, 'Proportion'] = 313/(512+313) = 0.3794
...
等等。
我尝试通过使用以下变体添加 'total' 列开始:
data.groupby(['Dept', 'Gender'])[['Freq']].sum()
但我似乎无法通过原始数据帧每一行中的值来查找此数据帧的值。
我也尝试过使用 lambda 函数,但出现 'function is not iterable' 错误。
我想人们可以逐行循环它,因为它是一个小数据集,但将来当我需要做这样的事情时,这将不是一个选项。
请帮助新手和有抱负的数据科学家。
对于与原始 DataFrame
大小相同的系列,您可以将列除以 div
with transform
:
data['new'] = data['Freq'].div(data.groupby(['Dept', 'Gender'])['Freq'].transform('sum'))
或将 apply
与自定义函数一起使用:
data['new'] = data.groupby(['Dept', 'Gender'])['Freq'].apply(lambda x: x/x.sum())
print (data)
Admit Gender Dept Freq new
0 Admitted Male A 512 0.620606
1 Rejected Male A 313 0.379394
2 Admitted Female A 89 0.824074
3 Rejected Female A 19 0.175926
4 Admitted Male B 353 0.630357
5 Rejected Male B 207 0.369643
6 Admitted Female B 17 0.680000
7 Rejected Female B 8 0.320000
8 Admitted Male C 120 0.369231
9 Rejected Male C 205 0.630769
10 Admitted Female C 202 0.340641
11 Rejected Female C 391 0.659359
12 Admitted Male D 138 0.330935
13 Rejected Male D 279 0.669065
14 Admitted Female D 131 0.349333
15 Rejected Female D 244 0.650667
16 Admitted Male E 53 0.277487
17 Rejected Male E 138 0.722513
18 Admitted Female E 94 0.239186
19 Rejected Female E 299 0.760814
20 Admitted Male F 22 0.058981
21 Rejected Male F 351 0.941019
22 Admitted Female F 24 0.070381
23 Rejected Female F 317 0.929619
首先,如果这已经在 Whosebug 的某处,我深表歉意,我自己试验了一个小时后又搜索了一个小时,但没有找到。我确信一定有一个优雅的(可能是基本的)解决方案。
我有以下数据框:
Admit Gender Dept Freq
0 Admitted Male A 512
1 Rejected Male A 313
2 Admitted Female A 89
3 Rejected Female A 19
4 Admitted Male B 353
5 Rejected Male B 207
6 Admitted Female B 17
7 Rejected Female B 8
8 Admitted Male C 120
9 Rejected Male C 205
10 Admitted Female C 202
11 Rejected Female C 391
12 Admitted Male D 138
13 Rejected Male D 279
14 Admitted Female D 131
15 Rejected Female D 244
16 Admitted Male E 53
17 Rejected Male E 138
18 Admitted Female E 94
19 Rejected Female E 299
20 Admitted Male F 22
21 Rejected Male F 351
22 Admitted Female F 24
23 Rejected Female F 317
我想添加一个列'Proportion',它给出了每个部门按性别划分的成功/失败申请人的比例。
这样:
df.loc[0, 'Proportion'] = 512/(512+313) = 0.6206
df.loc[1, 'Proportion'] = 313/(512+313) = 0.3794
...
等等。
我尝试通过使用以下变体添加 'total' 列开始:
data.groupby(['Dept', 'Gender'])[['Freq']].sum()
但我似乎无法通过原始数据帧每一行中的值来查找此数据帧的值。
我也尝试过使用 lambda 函数,但出现 'function is not iterable' 错误。
我想人们可以逐行循环它,因为它是一个小数据集,但将来当我需要做这样的事情时,这将不是一个选项。
请帮助新手和有抱负的数据科学家。
对于与原始 DataFrame
大小相同的系列,您可以将列除以 div
with transform
:
data['new'] = data['Freq'].div(data.groupby(['Dept', 'Gender'])['Freq'].transform('sum'))
或将 apply
与自定义函数一起使用:
data['new'] = data.groupby(['Dept', 'Gender'])['Freq'].apply(lambda x: x/x.sum())
print (data)
Admit Gender Dept Freq new
0 Admitted Male A 512 0.620606
1 Rejected Male A 313 0.379394
2 Admitted Female A 89 0.824074
3 Rejected Female A 19 0.175926
4 Admitted Male B 353 0.630357
5 Rejected Male B 207 0.369643
6 Admitted Female B 17 0.680000
7 Rejected Female B 8 0.320000
8 Admitted Male C 120 0.369231
9 Rejected Male C 205 0.630769
10 Admitted Female C 202 0.340641
11 Rejected Female C 391 0.659359
12 Admitted Male D 138 0.330935
13 Rejected Male D 279 0.669065
14 Admitted Female D 131 0.349333
15 Rejected Female D 244 0.650667
16 Admitted Male E 53 0.277487
17 Rejected Male E 138 0.722513
18 Admitted Female E 94 0.239186
19 Rejected Female E 299 0.760814
20 Admitted Male F 22 0.058981
21 Rejected Male F 351 0.941019
22 Admitted Female F 24 0.070381
23 Rejected Female F 317 0.929619