如何在 pandas 中的 NaN 前后 select 行?
How to select row before and after NaN in pandas?
我有一个如下所示的数据框:
Name Age Job
0 Alex 20 Student
1 Sara 21 Doctor
2 john 23 NaN
3 kevin 22 Teacher
4 Rosa 20 senior manager
5 johanes 25 Dentist
6 lina 23 Student
7 yaser 25 Pilot
8 jason 20 Manager
9 Ali 23 NaN
10 Ahmad 21 Professor
11 Joe 24 NaN
12 Donald 29 Waiter
.
.
.
.
我想 select 列 Job 中具有 NaN 值的行前后的行与行本身。为此,我有以下代码:
Rows = df[df. Shift(1, fill_value="dummy").Job. isna() | df.Job. isna()| df. Shift(-1, fill_value="dummy"). df. isna()]
print(Rows)
结果是这样的:
1 Sara 21 Doctor
2 john 23 NaN
3 kevin 22 Teacher
8 jason 20 Manager
9 Ali 23 NaN
10 Ahmad 21 Professor
11 Joe 24 NaN
12 Donald 29 Waiter
这里唯一的问题是第 10 行,它在结果中应该是双倍的,因为这一行是 NaN 之后的行,即第 9 行,同时是 NaN 值之前的行,即行号11(该行位于具有 NaN 值的两行之间)。所以最后我想要这个:
1 Sara 21 Doctor
2 john 23 NaN
3 kevin 22 Teacher
8 jason 20 Manager
9 Ali 23 NaN
10 Ahmad 21 Professor
10 Ahmad 21 Professor
11 Joe 24 NaN
12 Donald 29 Waiter
因此,具有 NaN 值的两行之间的每一行在结果中也应该是两次(或者应该是重复的)。有什么办法吗?任何帮助将不胜感激。
将 concat
用于之前、之后的行并按条件匹配:
m = df.Job.isna()
df = pd.concat([df[m.shift(fill_value=False)],
df[m.shift(-1, fill_value=False)],
df[m]]).sort_index()
print (df)
Name Age Job
1 Sara 21 Doctor
2 john 23 NaN
3 kevin 22 Teacher
8 jason 20 Manager
9 Ali 23 NaN
10 Ahmad 21 Professor
10 Ahmad 21 Professor
11 Joe 24 NaN
12 Donald 29 Waiter
我有一个如下所示的数据框:
Name Age Job
0 Alex 20 Student
1 Sara 21 Doctor
2 john 23 NaN
3 kevin 22 Teacher
4 Rosa 20 senior manager
5 johanes 25 Dentist
6 lina 23 Student
7 yaser 25 Pilot
8 jason 20 Manager
9 Ali 23 NaN
10 Ahmad 21 Professor
11 Joe 24 NaN
12 Donald 29 Waiter
.
.
.
.
我想 select 列 Job 中具有 NaN 值的行前后的行与行本身。为此,我有以下代码:
Rows = df[df. Shift(1, fill_value="dummy").Job. isna() | df.Job. isna()| df. Shift(-1, fill_value="dummy"). df. isna()]
print(Rows)
结果是这样的:
1 Sara 21 Doctor
2 john 23 NaN
3 kevin 22 Teacher
8 jason 20 Manager
9 Ali 23 NaN
10 Ahmad 21 Professor
11 Joe 24 NaN
12 Donald 29 Waiter
这里唯一的问题是第 10 行,它在结果中应该是双倍的,因为这一行是 NaN 之后的行,即第 9 行,同时是 NaN 值之前的行,即行号11(该行位于具有 NaN 值的两行之间)。所以最后我想要这个:
1 Sara 21 Doctor
2 john 23 NaN
3 kevin 22 Teacher
8 jason 20 Manager
9 Ali 23 NaN
10 Ahmad 21 Professor
10 Ahmad 21 Professor
11 Joe 24 NaN
12 Donald 29 Waiter
因此,具有 NaN 值的两行之间的每一行在结果中也应该是两次(或者应该是重复的)。有什么办法吗?任何帮助将不胜感激。
将 concat
用于之前、之后的行并按条件匹配:
m = df.Job.isna()
df = pd.concat([df[m.shift(fill_value=False)],
df[m.shift(-1, fill_value=False)],
df[m]]).sort_index()
print (df)
Name Age Job
1 Sara 21 Doctor
2 john 23 NaN
3 kevin 22 Teacher
8 jason 20 Manager
9 Ali 23 NaN
10 Ahmad 21 Professor
10 Ahmad 21 Professor
11 Joe 24 NaN
12 Donald 29 Waiter