pandas df 中列的条件填充
Conditional fill of columns in a pandas df
这个问题和几个有条件填写的问题类似。我正在尝试根据以下语句有条件地填写该列。
如果 Code
中的值以 A
开头,我想保持原样。
如果值 Code
以 B
开头,我想保持相同的初始值和 return nan's
到以下行,直到 [=中的下一个值=16=].
如果 Code
中的值以 C
开头,我想保持相同的第一个值,直到下一个值在 ['Numx','Numy]
中浮动
import pandas as pd
import numpy as np
d = ({
'Code' :['A1','A1','','B1','B1','A2','A2','','B2','B2','','A3','A3','A3','','B1','','B4','B4','A2','A2','A1','A1','','B4','B4','C1','C1','','','D1','','B2'],
'Numx' : [30.2,30.5,30.6,35.6,40.2,45.5,46.1,48.1,48.5,42.2,'',30.5,30.6,35.6,40.2,45.5,'',48.1,48.5,42.2, 40.1,48.5,42.2,'',48.5,42.2,43.1,44.1,'','','','',45.1],
'Numy' : [1.9,2.3,2.5,2.2,2.5,3.1,3.4,3.6,3.7,5.4,'',2.3,2.5,2.2,2.5,3.1,'',3.6,3.7,5.4,6.5,8.5,2.2,'',8.5,2.2,2.3,2.5,'','','','',3.2]
})
df = pd.DataFrame(数据=d)
输出:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 40.2 2.5
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 42.2 5.4
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 48.5 3.7
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 42.2 2.2
26 C1 43.1 2.3
27 C1 44.1 2.5
28 nan nan
29 nan nan
30 D1 nan nan
31 nan nan
32 B2 45.1 3.2
我使用了另一个问题发布的代码,但我 return 南的太多
df['Code_new'] = df['Code'].where(df['Code'].isin(['A1','A2','A3','A4','B1','B2','B4','C1'])).ffill()
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'A1'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
这会产生以下输出:
Code Numx Numy Code_new
0 A1 30.2 1.9 A1
1 A1 30.2 1.9 A1
2 30.2 1.9 A1
3 B1 35.6 2.2 B1
4 B1 NaN NaN B1
5 A2 45.5 3.1 A2
6 A2 NaN NaN A2
7 NaN NaN A2
8 B2 48.5 3.7 B2
9 B2 NaN NaN B2
10 NaN NaN B2
11 A3 30.5 2.3 A3
12 A3 NaN NaN A3
13 A3 NaN NaN A3
14 NaN NaN A3
15 B1 NaN NaN B1
16 NaN NaN B1
17 B4 48.1 3.6 B4
18 B4 NaN NaN B4
19 A2 NaN NaN A2
20 A2 NaN NaN A2
21 A1 30.2 1.9 A1
22 A1 30.2 1.9 A1
23 30.2 1.9 A1
24 B4 NaN NaN B4
25 B4 NaN NaN B4
26 C1 43.1 2.3 C1
27 C1 NaN NaN C1
28 NaN NaN C1
29 NaN NaN C1
30 D1 NaN NaN C1
31 NaN NaN C1
32 B2 NaN NaN B2
我想要的输出是:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 nan nan
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 nan nan
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 nan nan
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 nan nan
26 C1 43.1 2.3
27 C1 43.1 2.3
28 43.1 2.3
29 43.1 2.3
30 D1 43.1 2.3
31 43.1 2.3
32 B2 45.1 3.2
我觉得这条线mask = df['Code_new'] == 'A1'
我需要改一下。该代码有效,但我仅适用于 'A1'
代码中的值。就像在此处添加所有其他值一样简单。所以 A1-A4,B1-B4,C1
?
我认为需要
m2 = df['Code'].isin(['A1','A2','A3','A4','B1','B2','B4','C1'])
#create helper column for unique categories
df['Code_new'] = df['Code'].where(m2).ffill()
df['Code_new'] = (df['Code_new'] + '_' +
df['Code_new'].ne(df['Code_new'].shift()).cumsum().astype(str))
#check by start values and filter all columns without A
m1 = df['Code_new'].str.startswith(tuple(['A1','A2','A3','A4'])).fillna(False)
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated() & ~m1)
#replace by forward filling only starting with C
mask = df['Code_new'].str.startswith('C').fillna(False)
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 A1 30.2 1.9 A1_1
1 A1 30.5 2.3 A1_1
2 30.6 2.5 A1_1
3 B1 35.6 2.2 B1_2
4 B1 NaN NaN B1_2
5 A2 45.5 3.1 A2_3
6 A2 46.1 3.4 A2_3
7 48.1 3.6 A2_3
8 B2 48.5 3.7 B2_4
9 B2 NaN NaN B2_4
10 NaN NaN B2_4
11 A3 30.5 2.3 A3_5
12 A3 30.6 2.5 A3_5
13 A3 35.6 2.2 A3_5
14 40.2 2.5 A3_5
15 B1 45.5 3.1 B1_6
16 NaN NaN B1_6
17 B4 48.1 3.6 B4_7
18 B4 NaN NaN B4_7
19 A2 42.2 5.4 A2_8
20 A2 40.1 6.5 A2_8
21 A1 48.5 8.5 A1_9
22 A1 42.2 2.2 A1_9
23 A1_9
24 B4 48.5 8.5 B4_10
25 B4 NaN NaN B4_10
26 C1 43.1 2.3 C1_11
27 C1 43.1 2.3 C1_11
28 43.1 2.3 C1_11
29 43.1 2.3 C1_11
30 D1 43.1 2.3 C1_11
31 43.1 2.3 C1_11
32 B2 45.1 3.2 B2_12
这个问题和几个有条件填写的问题类似。我正在尝试根据以下语句有条件地填写该列。
如果 Code
中的值以 A
开头,我想保持原样。
如果值 Code
以 B
开头,我想保持相同的初始值和 return nan's
到以下行,直到 [=中的下一个值=16=].
如果 Code
中的值以 C
开头,我想保持相同的第一个值,直到下一个值在 ['Numx','Numy]
import pandas as pd
import numpy as np
d = ({
'Code' :['A1','A1','','B1','B1','A2','A2','','B2','B2','','A3','A3','A3','','B1','','B4','B4','A2','A2','A1','A1','','B4','B4','C1','C1','','','D1','','B2'],
'Numx' : [30.2,30.5,30.6,35.6,40.2,45.5,46.1,48.1,48.5,42.2,'',30.5,30.6,35.6,40.2,45.5,'',48.1,48.5,42.2, 40.1,48.5,42.2,'',48.5,42.2,43.1,44.1,'','','','',45.1],
'Numy' : [1.9,2.3,2.5,2.2,2.5,3.1,3.4,3.6,3.7,5.4,'',2.3,2.5,2.2,2.5,3.1,'',3.6,3.7,5.4,6.5,8.5,2.2,'',8.5,2.2,2.3,2.5,'','','','',3.2]
})
df = pd.DataFrame(数据=d)
输出:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 40.2 2.5
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 42.2 5.4
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 48.5 3.7
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 42.2 2.2
26 C1 43.1 2.3
27 C1 44.1 2.5
28 nan nan
29 nan nan
30 D1 nan nan
31 nan nan
32 B2 45.1 3.2
我使用了另一个问题发布的代码,但我 return 南的太多
df['Code_new'] = df['Code'].where(df['Code'].isin(['A1','A2','A3','A4','B1','B2','B4','C1'])).ffill()
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'A1'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
这会产生以下输出:
Code Numx Numy Code_new
0 A1 30.2 1.9 A1
1 A1 30.2 1.9 A1
2 30.2 1.9 A1
3 B1 35.6 2.2 B1
4 B1 NaN NaN B1
5 A2 45.5 3.1 A2
6 A2 NaN NaN A2
7 NaN NaN A2
8 B2 48.5 3.7 B2
9 B2 NaN NaN B2
10 NaN NaN B2
11 A3 30.5 2.3 A3
12 A3 NaN NaN A3
13 A3 NaN NaN A3
14 NaN NaN A3
15 B1 NaN NaN B1
16 NaN NaN B1
17 B4 48.1 3.6 B4
18 B4 NaN NaN B4
19 A2 NaN NaN A2
20 A2 NaN NaN A2
21 A1 30.2 1.9 A1
22 A1 30.2 1.9 A1
23 30.2 1.9 A1
24 B4 NaN NaN B4
25 B4 NaN NaN B4
26 C1 43.1 2.3 C1
27 C1 NaN NaN C1
28 NaN NaN C1
29 NaN NaN C1
30 D1 NaN NaN C1
31 NaN NaN C1
32 B2 NaN NaN B2
我想要的输出是:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 nan nan
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 nan nan
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 nan nan
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 nan nan
26 C1 43.1 2.3
27 C1 43.1 2.3
28 43.1 2.3
29 43.1 2.3
30 D1 43.1 2.3
31 43.1 2.3
32 B2 45.1 3.2
我觉得这条线mask = df['Code_new'] == 'A1'
我需要改一下。该代码有效,但我仅适用于 'A1'
代码中的值。就像在此处添加所有其他值一样简单。所以 A1-A4,B1-B4,C1
?
我认为需要
m2 = df['Code'].isin(['A1','A2','A3','A4','B1','B2','B4','C1'])
#create helper column for unique categories
df['Code_new'] = df['Code'].where(m2).ffill()
df['Code_new'] = (df['Code_new'] + '_' +
df['Code_new'].ne(df['Code_new'].shift()).cumsum().astype(str))
#check by start values and filter all columns without A
m1 = df['Code_new'].str.startswith(tuple(['A1','A2','A3','A4'])).fillna(False)
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated() & ~m1)
#replace by forward filling only starting with C
mask = df['Code_new'].str.startswith('C').fillna(False)
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 A1 30.2 1.9 A1_1
1 A1 30.5 2.3 A1_1
2 30.6 2.5 A1_1
3 B1 35.6 2.2 B1_2
4 B1 NaN NaN B1_2
5 A2 45.5 3.1 A2_3
6 A2 46.1 3.4 A2_3
7 48.1 3.6 A2_3
8 B2 48.5 3.7 B2_4
9 B2 NaN NaN B2_4
10 NaN NaN B2_4
11 A3 30.5 2.3 A3_5
12 A3 30.6 2.5 A3_5
13 A3 35.6 2.2 A3_5
14 40.2 2.5 A3_5
15 B1 45.5 3.1 B1_6
16 NaN NaN B1_6
17 B4 48.1 3.6 B4_7
18 B4 NaN NaN B4_7
19 A2 42.2 5.4 A2_8
20 A2 40.1 6.5 A2_8
21 A1 48.5 8.5 A1_9
22 A1 42.2 2.2 A1_9
23 A1_9
24 B4 48.5 8.5 B4_10
25 B4 NaN NaN B4_10
26 C1 43.1 2.3 C1_11
27 C1 43.1 2.3 C1_11
28 43.1 2.3 C1_11
29 43.1 2.3 C1_11
30 D1 43.1 2.3 C1_11
31 43.1 2.3 C1_11
32 B2 45.1 3.2 B2_12