使用 pandas 在两个条件下创建列

Create column on two conditions with pandas

我正在利用 pandas 做一些分析练习。我想创建一个新列,其值是两行的总和。原始数据集如下...

    Admit      Gender   Dept    Freq
0   Admitted    Male    A   512
1   Rejected    Male    A   313
2   Admitted    Female  A   89
3   Rejected    Female  A   19
4   Admitted    Male    B   353
5   Rejected    Male    B   207
6   Admitted    Female  B   17
7   Rejected    Female  B   8
8   Admitted    Male    C   120
9   Rejected    Male    C   205
10  Admitted    Female  C   202
11  Rejected    Female  C   391
12  Admitted    Male    D   138
13  Rejected    Male    D   279
14  Admitted    Female  D   131
15  Rejected    Female  D   244
16  Admitted    Male    E   53
17  Rejected    Male    E   138
18  Admitted    Female  E   94
19  Rejected    Female  E   299
20  Admitted    Male    F   22
21  Rejected    Male    F   351
22  Admitted    Female  F   24
23  Rejected    Female  F   317

我想使用以下数据框创建一个新列...

    Dept    Gender  Freq
0   A   Female  108
1   A   Male    825
2   B   Female  25
3   B   Male    560
4   C   Female  593
5   C   Male    325
6   D   Female  375
7   D   Male    417
8   E   Female  393
9   E   Male    191
10  F   Female  341
11  F   Male    373

我想利用第二个数据框的 Freq 列在第一个数据框中创建一个新列。我需要插入 108if Detp and Gender 在两个数据框中相同。新的数据框应该是这样的...

    Admit      Gender   Dept    Freq   Total
0   Admitted    Male    A   512        825
1   Rejected    Male    A   313        825
2   Admitted    Female  A   89         108
3   Rejected    Female  A   19         108
4   Admitted    Male    B   353        560
5   Rejected    Male    B   207        560
6   Admitted    Female  B   17         25
7   Rejected    Female  B   8          25 

我试过下面的代码...

for i in data.iterrows():
    for j in total_freq.iterrows():
        if i[1].Gender == total_freq.Gender & i[1].Dept == total_freq.Dept:
            data['Total'] = total_freq.Freq

我收到以下错误...TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]

对创建具有正确值的列有什么帮助吗?

你可以使用变换

df['Total'] = df.groupby(['Dept', 'Gender']).Freq.transform('sum')

你得到

    Admit   Gender  Dept    Freq    Total
0   Admitted    Male    A   512 825
1   Rejected    Male    A   313 825
2   Admitted    Female  A   89  108
3   Rejected    Female  A   19  108
4   Admitted    Male    B   353 560
5   Rejected    Male    B   207 560
6   Admitted    Female  B   17  25
7   Rejected    Female  B   8   25
8   Admitted    Male    C   120 325
9   Rejected    Male    C   205 325
10  Admitted    Female  C   202 593
11  Rejected    Female  C   391 593
12  Admitted    Male    D   138 417
13  Rejected    Male    D   279 417
14  Admitted    Female  D   131 375
15  Rejected    Female  D   244 375
16  Admitted    Male    E   53  191
17  Rejected    Male    E   138 191
18  Admitted    Female  E   94  393
19  Rejected    Female  E   299 393
20  Admitted    Male    F   22  373
21  Rejected    Male    F   351 373
22  Admitted    Female  F   24  341
23  Rejected    Female  F   317 341

您可以使用 pandas.DataFrame.merge() 将您的总计从第二个数据框左连接到第一个数据框。首先将totals中的freq重命名为df.

df1 = df1.rename(columns={'Freq':'Total'})
df_totals = pd.merge(df, df1['Total'], how='left', on=['Gender', 'Dept'])