如何根据行中的多个条件和上面的数据填充数据框中的缺失数据
How to fill missing data in dataframe based on multiple conditions in the row and from data above
我有以下数据框,
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955
114798 3 1 NULL 18/6/2018 22:00 191.2512
114888 2 1 NULL 18/6/2018 22:00 190.5125
114657 2 1 NULL 18/6/2018 22:00 414.6432
114738 1 1 NULL 18/6/2018 22:00 429.43
114885 3 1 NULL 18/6/2018 23:00 361.3246
114756 1 1 NULL 18/6/2018 23:00 409.51
我需要填写 Theoretical 为空的列,但只需要填写 scrap 也为空的列。
所以条件是,当 LineID 为 3
时,理论值始终为 511.2351
,当其 2
时,理论值始终为 519.9842,当其为 1
时,其理论值始终为总是 480.2252。但是当有报废价值的时候,理论上应该是空的。
我似乎想不出这样的前向填充方法。
我尝试了以下代码,但除了那些行之外,所有其他行都变成了 NaN。
df['Theoretical'] = np.select([(df['LineID']==3) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==2) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==1) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),],
(511.2351,519.9842,480.2252), np.nan)
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00
2 2 1 NULL 18/6/2020 5:00
3 1 1 NULL 18/6/2020 5:00
2186 3 1 NULL 18/6/2020 5:00
2520 2 1 NULL 18/6/2020 5:00
2840 1 1 NULL 18/6/2020 6:00
4 1 1 NULL 18/6/2020 6:00
5 3 1 NULL 18/6/2020 6:00
6 2 1 NULL 18/6/2020 6:00
7 3 1 NULL 18/6/2020 6:00
8 2 1 NULL 18/6/2020 7:00
9 1 1 NULL 18/6/2020 7:00
1939 2 1 NULL 18/6/2020 7:00
2462 3 1 NULL 18/6/2020 7:00
3075 1 1 NULL 18/6/2020 7:00
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
我需要这样
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
当然这不是最好的解决方案,但你可以尝试如下方法
df_new = pd.DataFrame({
"LineID":[1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 1],
"Theoretical": [480.2252, 519.9842, 511.2351, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
"Scrap": [np.nan, 0.5, 21, np.nan, 24, np.nan, 40, 34, np.nan, 0.4, np.nan, 10]
})
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 NaN NaN
4 2 NaN 24.0
5 1 NaN NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 NaN NaN
9 1 NaN 0.4
10 2 NaN NaN
11 1 NaN 10.0
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 3), "Theoretical"] = 511.2351
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 2), "Theoretical"] = 519.9842
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 1), "Theoretical"] = 480.2252
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 480.2252 NaN
4 2 NaN 24.0
5 1 480.2252 NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 511.2351 NaN
9 1 NaN 0.4
10 2 519.9842 NaN
11 1 NaN 10.0
我有以下数据框,
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955
114798 3 1 NULL 18/6/2018 22:00 191.2512
114888 2 1 NULL 18/6/2018 22:00 190.5125
114657 2 1 NULL 18/6/2018 22:00 414.6432
114738 1 1 NULL 18/6/2018 22:00 429.43
114885 3 1 NULL 18/6/2018 23:00 361.3246
114756 1 1 NULL 18/6/2018 23:00 409.51
我需要填写 Theoretical 为空的列,但只需要填写 scrap 也为空的列。
所以条件是,当 LineID 为 3
时,理论值始终为 511.2351
,当其 2
时,理论值始终为 519.9842,当其为 1
时,其理论值始终为总是 480.2252。但是当有报废价值的时候,理论上应该是空的。
我似乎想不出这样的前向填充方法。
我尝试了以下代码,但除了那些行之外,所有其他行都变成了 NaN。
df['Theoretical'] = np.select([(df['LineID']==3) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==2) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==1) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),],
(511.2351,519.9842,480.2252), np.nan)
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00
2 2 1 NULL 18/6/2020 5:00
3 1 1 NULL 18/6/2020 5:00
2186 3 1 NULL 18/6/2020 5:00
2520 2 1 NULL 18/6/2020 5:00
2840 1 1 NULL 18/6/2020 6:00
4 1 1 NULL 18/6/2020 6:00
5 3 1 NULL 18/6/2020 6:00
6 2 1 NULL 18/6/2020 6:00
7 3 1 NULL 18/6/2020 6:00
8 2 1 NULL 18/6/2020 7:00
9 1 1 NULL 18/6/2020 7:00
1939 2 1 NULL 18/6/2020 7:00
2462 3 1 NULL 18/6/2020 7:00
3075 1 1 NULL 18/6/2020 7:00
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
我需要这样
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
当然这不是最好的解决方案,但你可以尝试如下方法
df_new = pd.DataFrame({
"LineID":[1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 1],
"Theoretical": [480.2252, 519.9842, 511.2351, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
"Scrap": [np.nan, 0.5, 21, np.nan, 24, np.nan, 40, 34, np.nan, 0.4, np.nan, 10]
})
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 NaN NaN
4 2 NaN 24.0
5 1 NaN NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 NaN NaN
9 1 NaN 0.4
10 2 NaN NaN
11 1 NaN 10.0
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 3), "Theoretical"] = 511.2351
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 2), "Theoretical"] = 519.9842
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 1), "Theoretical"] = 480.2252
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 480.2252 NaN
4 2 NaN 24.0
5 1 480.2252 NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 511.2351 NaN
9 1 NaN 0.4
10 2 519.9842 NaN
11 1 NaN 10.0