pandas df 中列的条件填充

Question

这个问题和几个有条件填写的问题类似。我正在尝试根据以下语句有条件地填写该列。

如果 Code 中的值以 A 开头，我想保持原样。

如果值 Code 以 B 开头，我想保持相同的初始值和 return nan's 到以下行，直到 [=中的下一个值=16=].

如果 Code 中的值以 C 开头，我想保持相同的第一个值，直到下一个值在 ['Numx','Numy]

中浮动

import pandas as pd
import numpy as np


d = ({                          
      'Code' :['A1','A1','','B1','B1','A2','A2','','B2','B2','','A3','A3','A3','','B1','','B4','B4','A2','A2','A1','A1','','B4','B4','C1','C1','','','D1','','B2'],
      'Numx' : [30.2,30.5,30.6,35.6,40.2,45.5,46.1,48.1,48.5,42.2,'',30.5,30.6,35.6,40.2,45.5,'',48.1,48.5,42.2, 40.1,48.5,42.2,'',48.5,42.2,43.1,44.1,'','','','',45.1],
      'Numy' : [1.9,2.3,2.5,2.2,2.5,3.1,3.4,3.6,3.7,5.4,'',2.3,2.5,2.2,2.5,3.1,'',3.6,3.7,5.4,6.5,8.5,2.2,'',8.5,2.2,2.3,2.5,'','','','',3.2]
      })

df = pd.DataFrame(数据=d)

输出：

   Code  Numx Numy
0    A1  30.2  1.9
1    A1  30.5  2.3
2        30.6  2.5
3    B1  35.6  2.2
4    B1  40.2  2.5
5    A2  45.5  3.1
6    A2  46.1  3.4
7        48.1  3.6
8    B2  48.5  3.7
9    B2  42.2  5.4
10        nan  nan       
11   A3  30.5  2.3
12   A3  30.6  2.5
13   A3  35.6  2.2
14       40.2  2.5
15   B1  45.5  3.1
16        nan  nan     
17   B4  48.1  3.6
18   B4  48.5  3.7
19   A2  42.2  5.4
20   A2  40.1  6.5
21   A1  48.5  8.5
22   A1  42.2  2.2
23        nan  nan      
24   B4  48.5  8.5
25   B4  42.2  2.2
26   C1  43.1  2.3
27   C1  44.1  2.5
28        nan  nan      
29        nan  nan   
30   D1   nan  nan      
31        nan  nan        
32   B2  45.1  3.2

我使用了另一个问题发布的代码，但我 return 南的太多

df['Code_new'] = df['Code'].where(df['Code'].isin(['A1','A2','A3','A4','B1','B2','B4','C1'])).ffill()

df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'A1'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()

这会产生以下输出：

   Code  Numx Numy Code_new
0    A1  30.2  1.9       A1
1    A1  30.2  1.9       A1
2        30.2  1.9       A1
3    B1  35.6  2.2       B1
4    B1   NaN  NaN       B1
5    A2  45.5  3.1       A2
6    A2   NaN  NaN       A2
7         NaN  NaN       A2
8    B2  48.5  3.7       B2
9    B2   NaN  NaN       B2
10        NaN  NaN       B2
11   A3  30.5  2.3       A3
12   A3   NaN  NaN       A3
13   A3   NaN  NaN       A3
14        NaN  NaN       A3
15   B1   NaN  NaN       B1
16        NaN  NaN       B1
17   B4  48.1  3.6       B4
18   B4   NaN  NaN       B4
19   A2   NaN  NaN       A2
20   A2   NaN  NaN       A2
21   A1  30.2  1.9       A1
22   A1  30.2  1.9       A1
23       30.2  1.9       A1
24   B4   NaN  NaN       B4
25   B4   NaN  NaN       B4
26   C1  43.1  2.3       C1
27   C1   NaN  NaN       C1
28        NaN  NaN       C1
29        NaN  NaN       C1
30   D1   NaN  NaN       C1
31        NaN  NaN       C1
32   B2   NaN  NaN       B2

我想要的输出是：

   Code  Numx Numy
0    A1  30.2  1.9
1    A1  30.5  2.3
2        30.6  2.5
3    B1  35.6  2.2
4    B1   nan  nan
5    A2  45.5  3.1
6    A2  46.1  3.4
7        48.1  3.6
8    B2  48.5  3.7
9    B2   nan  nan
10        nan  nan        
11   A3  30.5  2.3
12   A3  30.6  2.5
13   A3  35.6  2.2
14       40.2  2.5
15   B1  45.5  3.1
16        nan  nan         
17   B4  48.1  3.6
18   B4   nan  nan
19   A2  42.2  5.4
20   A2  40.1  6.5
21   A1  48.5  8.5
22   A1  42.2  2.2
23        nan  nan      
24   B4  48.5  8.5
25   B4   nan  nan
26   C1  43.1  2.3
27   C1  43.1  2.3
28       43.1  2.3   
29       43.1  2.3   
30   D1  43.1  2.3   
31       43.1  2.3         
32   B2  45.1  3.2

我觉得这条线mask = df['Code_new'] == 'A1'我需要改一下。该代码有效，但我仅适用于 'A1' 代码中的值。就像在此处添加所有其他值一样简单。所以 A1-A4,B1-B4,C1?

Answer 1

我认为需要

m2 = df['Code'].isin(['A1','A2','A3','A4','B1','B2','B4','C1'])

#create helper column for unique categories
df['Code_new'] = df['Code'].where(m2).ffill()
df['Code_new'] = (df['Code_new'] + '_' + 
                  df['Code_new'].ne(df['Code_new'].shift()).cumsum().astype(str))

#check by start values and filter all columns without A
m1 = df['Code_new'].str.startswith(tuple(['A1','A2','A3','A4'])).fillna(False)
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated() & ~m1)

#replace by forward filling only starting with C
mask = df['Code_new'].str.startswith('C').fillna(False)
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()

print (df)

   Code  Numx Numy Code_new
0    A1  30.2  1.9     A1_1
1    A1  30.5  2.3     A1_1
2        30.6  2.5     A1_1
3    B1  35.6  2.2     B1_2
4    B1   NaN  NaN     B1_2
5    A2  45.5  3.1     A2_3
6    A2  46.1  3.4     A2_3
7        48.1  3.6     A2_3
8    B2  48.5  3.7     B2_4
9    B2   NaN  NaN     B2_4
10        NaN  NaN     B2_4
11   A3  30.5  2.3     A3_5
12   A3  30.6  2.5     A3_5
13   A3  35.6  2.2     A3_5
14       40.2  2.5     A3_5
15   B1  45.5  3.1     B1_6
16        NaN  NaN     B1_6
17   B4  48.1  3.6     B4_7
18   B4   NaN  NaN     B4_7
19   A2  42.2  5.4     A2_8
20   A2  40.1  6.5     A2_8
21   A1  48.5  8.5     A1_9
22   A1  42.2  2.2     A1_9
23                     A1_9
24   B4  48.5  8.5    B4_10
25   B4   NaN  NaN    B4_10
26   C1  43.1  2.3    C1_11
27   C1  43.1  2.3    C1_11
28       43.1  2.3    C1_11
29       43.1  2.3    C1_11
30   D1  43.1  2.3    C1_11
31       43.1  2.3    C1_11
32   B2  45.1  3.2    B2_12

pandas df 中列的条件填充

Conditional fill of columns in a pandas df

python

where

apply

fill

pandas