Dataframe 将函数应用于具有特定条件的行
Dataframe applying function to rows with specific condition
这是我的数据框中的示例:
id DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND
0 2017-04-01 330.0 1234.0 -1.0 100.0
1 2017-04-01 330.0 1234.0 0.0 80.0
2 2017-04-02 331.0 1235.0 -1.0 91.0
3 2017-04-02 331.0 1235.0 0.0 83.0
4 2017-04-03 332.0 1236.0 -1.0 92.0
5 2017-04-03 332.0 1236.0 0.0 81.0
6 2017-04-04 333.0 1237.0 -1.0 87.0
7 2017-04-04 333.0 1237.0 0.0 70.0
8 2017-04-05 334.0 1238.0 -1.0 93.0
9 2017-04-05 334.0 1238.0 0.0 90.0
10 2017-04-06 335.0 1239.0 -1.0 89.0
11 2017-04-06 335.0 1239.0 0.0 85.0
12 2017-04-07 336.0 1240.0 -1.0 82.0
13 2017-04-07 336.0 1240.0 0.0 76.0
这是火车预订的数据框,DPT_DATE=出发日期TRAIN_NO=火车数量J_X=出发前天数(J_X=0.0表示出发当天,J_X=-1表示出发后的第二天)和RES_HOLD_IND是当天保留的预订
我想为每个 DPT_DATE 和 TRAIN_NO 创建一个新列,以便为我提供当天的 RES_HOLD_IND J_X=-1
示例(我想要这个):
id DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND RES_J-1
0 2017-04-01 330.0 1234.0 -1.0 100.0 100.0
1 2017-04-01 330.0 1234.0 0.0 80.0 100.0
2 2017-04-02 331.0 1235.0 -1.0 91.0 91.0
3 2017-04-02 331.0 1235.0 0.0 83.0 91.0
4 2017-04-03 332.0 1236.0 -1.0 92.0 92.0
5 2017-04-03 332.0 1236.0 0.0 81.0 92.0
6 2017-04-04 333.0 1237.0 -1.0 87.0 87.0
7 2017-04-04 333.0 1237.0 0.0 70.0 87.0
感谢您的帮助!
我认为您需要先按 boolean indexing
or query
and then groupby
with DataFrameGroupBy.ffill
过滤效果好,如果总是 -1
值在每组的第一行:
df['RES_J-1'] = df.query('J_X == -1')['RES_HOLD_IND']
#alternative
#df['RES_J-1'] = df.loc[df['J_X'] == -1, 'RES_HOLD_IND']
df['RES_J-1'] = df.groupby(['DPT_DATE','TRAIN_NO'])['RES_J-1'].ffill()
print (df)
DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND RES_J-1
0 2017-04-01 330.0 1234.0 -1.0 100.0 100.0
1 2017-04-01 330.0 1234.0 0.0 80.0 100.0
2 2017-04-02 331.0 1235.0 -1.0 91.0 91.0
3 2017-04-02 331.0 1235.0 0.0 83.0 91.0
4 2017-04-03 332.0 1236.0 -1.0 92.0 92.0
5 2017-04-03 332.0 1236.0 0.0 81.0 92.0
6 2017-04-04 333.0 1237.0 -1.0 87.0 87.0
7 2017-04-04 333.0 1237.0 0.0 70.0 87.0
8 2017-04-05 334.0 1238.0 -1.0 93.0 93.0
9 2017-04-05 334.0 1238.0 0.0 90.0 93.0
10 2017-04-06 335.0 1239.0 -1.0 89.0 89.0
11 2017-04-06 335.0 1239.0 0.0 85.0 89.0
12 2017-04-07 336.0 1240.0 -1.0 82.0 82.0
13 2017-04-07 336.0 1240.0 0.0 76.0 82.0
如果-1
每组只有一个但并不总是第一个使用:
df['RES_J-1'] = df.groupby(['DPT_DATE','TRAIN_NO'])['RES_J-1']
.apply(lambda x: x.ffill().bfill())
这是我的数据框中的示例:
id DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND
0 2017-04-01 330.0 1234.0 -1.0 100.0
1 2017-04-01 330.0 1234.0 0.0 80.0
2 2017-04-02 331.0 1235.0 -1.0 91.0
3 2017-04-02 331.0 1235.0 0.0 83.0
4 2017-04-03 332.0 1236.0 -1.0 92.0
5 2017-04-03 332.0 1236.0 0.0 81.0
6 2017-04-04 333.0 1237.0 -1.0 87.0
7 2017-04-04 333.0 1237.0 0.0 70.0
8 2017-04-05 334.0 1238.0 -1.0 93.0
9 2017-04-05 334.0 1238.0 0.0 90.0
10 2017-04-06 335.0 1239.0 -1.0 89.0
11 2017-04-06 335.0 1239.0 0.0 85.0
12 2017-04-07 336.0 1240.0 -1.0 82.0
13 2017-04-07 336.0 1240.0 0.0 76.0
这是火车预订的数据框,DPT_DATE=出发日期TRAIN_NO=火车数量J_X=出发前天数(J_X=0.0表示出发当天,J_X=-1表示出发后的第二天)和RES_HOLD_IND是当天保留的预订
我想为每个 DPT_DATE 和 TRAIN_NO 创建一个新列,以便为我提供当天的 RES_HOLD_IND J_X=-1
示例(我想要这个):
id DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND RES_J-1
0 2017-04-01 330.0 1234.0 -1.0 100.0 100.0
1 2017-04-01 330.0 1234.0 0.0 80.0 100.0
2 2017-04-02 331.0 1235.0 -1.0 91.0 91.0
3 2017-04-02 331.0 1235.0 0.0 83.0 91.0
4 2017-04-03 332.0 1236.0 -1.0 92.0 92.0
5 2017-04-03 332.0 1236.0 0.0 81.0 92.0
6 2017-04-04 333.0 1237.0 -1.0 87.0 87.0
7 2017-04-04 333.0 1237.0 0.0 70.0 87.0
感谢您的帮助!
我认为您需要先按 boolean indexing
or query
and then groupby
with DataFrameGroupBy.ffill
过滤效果好,如果总是 -1
值在每组的第一行:
df['RES_J-1'] = df.query('J_X == -1')['RES_HOLD_IND']
#alternative
#df['RES_J-1'] = df.loc[df['J_X'] == -1, 'RES_HOLD_IND']
df['RES_J-1'] = df.groupby(['DPT_DATE','TRAIN_NO'])['RES_J-1'].ffill()
print (df)
DPT_DATE TRANCHE_NO TRAIN_NO J_X RES_HOLD_IND RES_J-1
0 2017-04-01 330.0 1234.0 -1.0 100.0 100.0
1 2017-04-01 330.0 1234.0 0.0 80.0 100.0
2 2017-04-02 331.0 1235.0 -1.0 91.0 91.0
3 2017-04-02 331.0 1235.0 0.0 83.0 91.0
4 2017-04-03 332.0 1236.0 -1.0 92.0 92.0
5 2017-04-03 332.0 1236.0 0.0 81.0 92.0
6 2017-04-04 333.0 1237.0 -1.0 87.0 87.0
7 2017-04-04 333.0 1237.0 0.0 70.0 87.0
8 2017-04-05 334.0 1238.0 -1.0 93.0 93.0
9 2017-04-05 334.0 1238.0 0.0 90.0 93.0
10 2017-04-06 335.0 1239.0 -1.0 89.0 89.0
11 2017-04-06 335.0 1239.0 0.0 85.0 89.0
12 2017-04-07 336.0 1240.0 -1.0 82.0 82.0
13 2017-04-07 336.0 1240.0 0.0 76.0 82.0
如果-1
每组只有一个但并不总是第一个使用:
df['RES_J-1'] = df.groupby(['DPT_DATE','TRAIN_NO'])['RES_J-1']
.apply(lambda x: x.ffill().bfill())