如何根据行中的多个条件和上面的数据填充数据框中的缺失数据

How to fill missing data in dataframe based on multiple conditions in the row and from data above

我有以下数据框,

ID      LineID  TeamID  ShiftID DateTime        Production  Theoretical  Scrap
1       3       1       NULL    18/6/2020 4:00  482.5291    511.2351     
2       2       1       NULL    18/6/2020 5:00  467.8704    519.9842
3       1       1       NULL    18/6/2020 5:00  390.5945    480.2252    
2186    3       1       NULL    18/6/2020 5:00  0                        0.5
2520    2       1       NULL    18/6/2020 5:00  0                        21
2840    1       1       NULL    18/6/2020 6:00  0                        12
4       1       1       NULL    18/6/2020 6:00  389.2222    480.2252        
5       3       1       NULL    18/6/2020 6:00  516.0907    511.2351    
6       2       1       NULL    18/6/2020 6:00  450.5216    519.9842    
7       3       1       NULL    18/6/2020 6:00  397.9998    511.2351    
8       2       1       NULL    18/6/2020 7:00  456.9486    519.9842    
9       1       1       NULL    18/6/2020 7:00  414.6932    480.2252        
1939    2       1       NULL    18/6/2020 7:00  0                        24
2462    3       1       NULL    18/6/2020 7:00  0                        3
3075    1       1       NULL    18/6/2020 7:00  0                        3.5
1
......
......
......
114678  1       1       NULL    18/6/2018 22:00 343.5955        
114798  3       1       NULL    18/6/2018 22:00 191.2512        
114888  2       1       NULL    18/6/2018 22:00 190.5125        
114657  2       1       NULL    18/6/2018 22:00 414.6432        
114738  1       1       NULL    18/6/2018 22:00 429.43      
114885  3       1       NULL    18/6/2018 23:00 361.3246        
114756  1       1       NULL    18/6/2018 23:00 409.51      

我需要填写 Theoretical 为空的列,但只需要填写 scrap 也为空的列。

所以条件是,当 LineID 为 3 时,理论值始终为 511.2351,当其 2 时,理论值始终为 519.9842,当其为 1 时,其理论值始终为总是 480.2252。但是当有报废价值的时候,理论上应该是空的。

我似乎想不出这样的前向填充方法。

我尝试了以下代码,但除了那些行之外,所有其他行都变成了 NaN。

df['Theoretical'] = np.select([(df['LineID']==3) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
                        (df['LineID']==2) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
                        (df['LineID']==1) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),],
                       (511.2351,519.9842,480.2252), np.nan)

ID      LineID  TeamID  ShiftID DateTime        Production  Theoretical  Scrap
1       3       1       NULL    18/6/2020 4:00  
2       2       1       NULL    18/6/2020 5:00  
3       1       1       NULL    18/6/2020 5:00 
2186    3       1       NULL    18/6/2020 5:00  
2520    2       1       NULL    18/6/2020 5:00  
2840    1       1       NULL    18/6/2020 6:00  
4       1       1       NULL    18/6/2020 6:00      
5       3       1       NULL    18/6/2020 6:00  
6       2       1       NULL    18/6/2020 6:00     
7       3       1       NULL    18/6/2020 6:00   
8       2       1       NULL    18/6/2020 7:00    
9       1       1       NULL    18/6/2020 7:00   
1939    2       1       NULL    18/6/2020 7:00 
2462    3       1       NULL    18/6/2020 7:00  
3075    1       1       NULL    18/6/2020 7:00  
1
......
......
......
114678  1       1       NULL    18/6/2018 22:00 343.5955    480.2252    
114798  3       1       NULL    18/6/2018 22:00 191.2512    511.2351        
114888  2       1       NULL    18/6/2018 22:00 190.5125    519.9842    
114657  2       1       NULL    18/6/2018 22:00 414.6432    519.9842
114738  1       1       NULL    18/6/2018 22:00 429.43      480.2252
114885  3       1       NULL    18/6/2018 23:00 361.3246    511.2351    
114756  1       1       NULL    18/6/2018 23:00 409.51      480.2252

我需要这样

ID      LineID  TeamID  ShiftID DateTime        Production  Theoretical  Scrap
1       3       1       NULL    18/6/2020 4:00  482.5291    511.2351    
2       2       1       NULL    18/6/2020 5:00  467.8704    519.9842
3       1       1       NULL    18/6/2020 5:00  390.5945    480.2252    
2186    3       1       NULL    18/6/2020 5:00  0                        0.5
2520    2       1       NULL    18/6/2020 5:00  0                        21
2840    1       1       NULL    18/6/2020 6:00  0                        12
4       1       1       NULL    18/6/2020 6:00  389.2222    480.2252        
5       3       1       NULL    18/6/2020 6:00  516.0907    511.2351    
6       2       1       NULL    18/6/2020 6:00  450.5216    519.9842    
7       3       1       NULL    18/6/2020 6:00  397.9998    511.2351    
8       2       1       NULL    18/6/2020 7:00  456.9486    519.9842    
9       1       1       NULL    18/6/2020 7:00  414.6932    480.2252        
1939    2       1       NULL    18/6/2020 7:00  0                        24
2462    3       1       NULL    18/6/2020 7:00  0                        3
3075    1       1       NULL    18/6/2020 7:00  0                        3.5
1
......
......
......
114678  1       1       NULL    18/6/2018 22:00 343.5955    480.2252    
114798  3       1       NULL    18/6/2018 22:00 191.2512    511.2351        
114888  2       1       NULL    18/6/2018 22:00 190.5125    519.9842    
114657  2       1       NULL    18/6/2018 22:00 414.6432    519.9842
114738  1       1       NULL    18/6/2018 22:00 429.43      480.2252
114885  3       1       NULL    18/6/2018 23:00 361.3246    511.2351    
114756  1       1       NULL    18/6/2018 23:00 409.51      480.2252    

当然这不是最好的解决方案,但你可以尝试如下方法

df_new = pd.DataFrame({
    "LineID":[1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 1], 
    "Theoretical": [480.2252, 519.9842, 511.2351, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
    "Scrap": [np.nan, 0.5, 21, np.nan, 24, np.nan, 40, 34,  np.nan, 0.4,  np.nan, 10]
})

df_new 
    LineID  Theoretical     Scrap
0   1   480.2252    NaN
1   2   519.9842    0.5
2   3   511.2351    21.0
3   1   NaN     NaN
4   2   NaN     24.0
5   1   NaN     NaN
6   1   NaN     40.0
7   2   NaN     34.0
8   3   NaN     NaN
9   1   NaN     0.4
10  2   NaN     NaN
11  1   NaN     10.0

df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 3), "Theoretical"] = 511.2351
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 2), "Theoretical"] = 519.9842
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 1), "Theoretical"] = 480.2252

df_new

    LineID  Theoretical     Scrap
0   1   480.2252    NaN
1   2   519.9842    0.5
2   3   511.2351    21.0
3   1   480.2252    NaN
4   2   NaN     24.0
5   1   480.2252    NaN
6   1   NaN     40.0
7   2   NaN     34.0
8   3   511.2351    NaN
9   1   NaN     0.4
10  2   519.9842    NaN
11  1   NaN     10.0