2 Pandas - 查找不匹配的行并删除与小时不匹配的额外行
2 Pandas - Find Unmatched Rows and Delete Extra Rows Not Matching Hour
我需要一些方法来逐行检查“小时”列中的差异,以便当 df1“小时”数据跳过一个小时时,该行将在 df2 中删除。然后,最终删除额外行后 df2 的长度将与 df1 的长度匹配。我试过使用 isin 但它并没有为我完成这项工作,可能是因为每一天的时间都在重复。我有 2 个 df,df1 和 df2。 df1 看起来像这样:
plant_name wind_speed_obs hour day month year
0 BIG HORN I 4.354742 1 1 1 2018
1 BIG HORN I 4.493089 2 1 1 2018
2 BIG HORN I 3.270214 3 1 1 2018
3 BIG HORN I 2.201387 4 1 1 2018
4 BIG HORN I 1.107117 5 1 1 2018
5 BIG HORN I 0.653544 6 1 1 2018
6 BIG HORN I 0.437724 7 1 1 2018
7 BIG HORN I 1.039667 8 1 1 2018
8 BIG HORN I 0.859894 9 1 1 2018
9 BIG HORN I 0.984382 10 1 1 2018
10 BIG HORN I 0.867333 11 1 1 2018
11 BIG HORN I 0.651906 12 1 1 2018
12 BIG HORN I 0.707006 13 1 1 2018
13 BIG HORN I 0.794844 14 1 1 2018
14 BIG HORN I 0.808548 15 1 1 2018
15 BIG HORN I 0.631703 16 1 1 2018
16 BIG HORN I 0.662685 17 1 1 2018
17 BIG HORN I 0.792321 18 1 1 2018
18 BIG HORN I 0.996753 19 1 1 2018
19 BIG HORN I 1.177580 20 1 1 2018
20 BIG HORN I 1.608482 21 1 1 2018
21 BIG HORN I 1.964004 22 1 1 2018
22 BIG HORN I 1.695751 23 1 1 2018
24 BIG HORN I 2.244386 1 2 1 2018
25 BIG HORN I 3.111387 2 2 1 2018
df2 看起来像这样:
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
23 BIG HORN I 2.2 0 2 1 2018
24 BIG HORN I 3.9 1 2 1 2018
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018
在 df2 的“小时”列(参见上面的索引 = 23)中找到不匹配的小时后,它有一个在 df1 中找不到的“0”小时行,df2 数据帧应该看起来像这样,“0 " 删除小时行:
新的 df2: -- 谢谢!
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
24 BIG HORN I 3.9 1 2 1 2018
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018
对所有 date/time 列使用 isin
:
df2 = df2[df2['hour'].isin(df1['hour']) &
df2['day'].isin(df1['day']) &
df2['month'].isin(df1['month']) &
df2['year'].isin(df1['year'])]
df2
Out[1]:
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
24 BIG HORN I 3.9 1 2 1 2018 #row with index of 23 removed
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018
我需要一些方法来逐行检查“小时”列中的差异,以便当 df1“小时”数据跳过一个小时时,该行将在 df2 中删除。然后,最终删除额外行后 df2 的长度将与 df1 的长度匹配。我试过使用 isin 但它并没有为我完成这项工作,可能是因为每一天的时间都在重复。我有 2 个 df,df1 和 df2。 df1 看起来像这样:
plant_name wind_speed_obs hour day month year
0 BIG HORN I 4.354742 1 1 1 2018
1 BIG HORN I 4.493089 2 1 1 2018
2 BIG HORN I 3.270214 3 1 1 2018
3 BIG HORN I 2.201387 4 1 1 2018
4 BIG HORN I 1.107117 5 1 1 2018
5 BIG HORN I 0.653544 6 1 1 2018
6 BIG HORN I 0.437724 7 1 1 2018
7 BIG HORN I 1.039667 8 1 1 2018
8 BIG HORN I 0.859894 9 1 1 2018
9 BIG HORN I 0.984382 10 1 1 2018
10 BIG HORN I 0.867333 11 1 1 2018
11 BIG HORN I 0.651906 12 1 1 2018
12 BIG HORN I 0.707006 13 1 1 2018
13 BIG HORN I 0.794844 14 1 1 2018
14 BIG HORN I 0.808548 15 1 1 2018
15 BIG HORN I 0.631703 16 1 1 2018
16 BIG HORN I 0.662685 17 1 1 2018
17 BIG HORN I 0.792321 18 1 1 2018
18 BIG HORN I 0.996753 19 1 1 2018
19 BIG HORN I 1.177580 20 1 1 2018
20 BIG HORN I 1.608482 21 1 1 2018
21 BIG HORN I 1.964004 22 1 1 2018
22 BIG HORN I 1.695751 23 1 1 2018
24 BIG HORN I 2.244386 1 2 1 2018
25 BIG HORN I 3.111387 2 2 1 2018
df2 看起来像这样:
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
23 BIG HORN I 2.2 0 2 1 2018
24 BIG HORN I 3.9 1 2 1 2018
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018
在 df2 的“小时”列(参见上面的索引 = 23)中找到不匹配的小时后,它有一个在 df1 中找不到的“0”小时行,df2 数据帧应该看起来像这样,“0 " 删除小时行: 新的 df2: -- 谢谢!
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
24 BIG HORN I 3.9 1 2 1 2018
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018
对所有 date/time 列使用 isin
:
df2 = df2[df2['hour'].isin(df1['hour']) &
df2['day'].isin(df1['day']) &
df2['month'].isin(df1['month']) &
df2['year'].isin(df1['year'])]
df2
Out[1]:
plant_name wind_speed_ms hour day month year
0 BIG HORN I 3.6 1 1 1 2018
1 BIG HORN I 3.1 2 1 1 2018
2 BIG HORN I 3.1 3 1 1 2018
3 BIG HORN I 2.0 4 1 1 2018
4 BIG HORN I 1.6 5 1 1 2018
5 BIG HORN I 0.8 6 1 1 2018
6 BIG HORN I 0.8 7 1 1 2018
7 BIG HORN I 1.0 8 1 1 2018
8 BIG HORN I 0.3 9 1 1 2018
9 BIG HORN I 0.1 10 1 1 2018
10 BIG HORN I 1.1 11 1 1 2018
11 BIG HORN I 1.9 12 1 1 2018
12 BIG HORN I 1.9 13 1 1 2018
13 BIG HORN I 1.0 14 1 1 2018
14 BIG HORN I 0.7 15 1 1 2018
15 BIG HORN I 2.1 16 1 1 2018
16 BIG HORN I 3.5 17 1 1 2018
17 BIG HORN I 2.1 18 1 1 2018
18 BIG HORN I 1.3 19 1 1 2018
19 BIG HORN I 2.3 20 1 1 2018
20 BIG HORN I 2.8 21 1 1 2018
21 BIG HORN I 3.0 22 1 1 2018
22 BIG HORN I 2.5 23 1 1 2018
24 BIG HORN I 3.9 1 2 1 2018 #row with index of 23 removed
25 BIG HORN I 4.3 2 2 1 2018
26 BIG HORN I 3.5 3 2 1 2018