2 Pandas - 查找不匹配的行并删除与小时不匹配的额外行

2 Pandas - Find Unmatched Rows and Delete Extra Rows Not Matching Hour

我需要一些方法来逐行检查“小时”列中的差异,以便当 df1“小时”数据跳过一个小时时,该行将在 df2 中删除。然后,最终删除额外行后 df2 的长度将与 df1 的长度匹配。我试过使用 isin 但它并没有为我完成这项工作,可能是因为每一天的时间都在重复。我有 2 个 df,df1 和 df2。 df1 看起来像这样:

    plant_name  wind_speed_obs  hour  day  month  year
0   BIG HORN I        4.354742     1    1      1  2018
1   BIG HORN I        4.493089     2    1      1  2018
2   BIG HORN I        3.270214     3    1      1  2018
3   BIG HORN I        2.201387     4    1      1  2018
4   BIG HORN I        1.107117     5    1      1  2018
5   BIG HORN I        0.653544     6    1      1  2018
6   BIG HORN I        0.437724     7    1      1  2018
7   BIG HORN I        1.039667     8    1      1  2018
8   BIG HORN I        0.859894     9    1      1  2018
9   BIG HORN I        0.984382    10    1      1  2018
10  BIG HORN I        0.867333    11    1      1  2018
11  BIG HORN I        0.651906    12    1      1  2018
12  BIG HORN I        0.707006    13    1      1  2018
13  BIG HORN I        0.794844    14    1      1  2018
14  BIG HORN I        0.808548    15    1      1  2018
15  BIG HORN I        0.631703    16    1      1  2018
16  BIG HORN I        0.662685    17    1      1  2018
17  BIG HORN I        0.792321    18    1      1  2018
18  BIG HORN I        0.996753    19    1      1  2018
19  BIG HORN I        1.177580    20    1      1  2018
20  BIG HORN I        1.608482    21    1      1  2018
21  BIG HORN I        1.964004    22    1      1  2018
22  BIG HORN I        1.695751    23    1      1  2018
24  BIG HORN I        2.244386     1    2      1  2018
25  BIG HORN I        3.111387     2    2      1  2018

df2 看起来像这样:

    plant_name  wind_speed_ms  hour  day  month  year
0   BIG HORN I            3.6     1    1      1  2018
1   BIG HORN I            3.1     2    1      1  2018
2   BIG HORN I            3.1     3    1      1  2018
3   BIG HORN I            2.0     4    1      1  2018
4   BIG HORN I            1.6     5    1      1  2018
5   BIG HORN I            0.8     6    1      1  2018
6   BIG HORN I            0.8     7    1      1  2018
7   BIG HORN I            1.0     8    1      1  2018
8   BIG HORN I            0.3     9    1      1  2018
9   BIG HORN I            0.1    10    1      1  2018
10  BIG HORN I            1.1    11    1      1  2018
11  BIG HORN I            1.9    12    1      1  2018
12  BIG HORN I            1.9    13    1      1  2018
13  BIG HORN I            1.0    14    1      1  2018
14  BIG HORN I            0.7    15    1      1  2018
15  BIG HORN I            2.1    16    1      1  2018
16  BIG HORN I            3.5    17    1      1  2018
17  BIG HORN I            2.1    18    1      1  2018
18  BIG HORN I            1.3    19    1      1  2018
19  BIG HORN I            2.3    20    1      1  2018
20  BIG HORN I            2.8    21    1      1  2018
21  BIG HORN I            3.0    22    1      1  2018
22  BIG HORN I            2.5    23    1      1  2018
23  BIG HORN I            2.2     0    2      1  2018
24  BIG HORN I            3.9     1    2      1  2018
25  BIG HORN I            4.3     2    2      1  2018
26  BIG HORN I            3.5     3    2      1  2018

在 df2 的“小时”列(参见上面的索引 = 23)中找到不匹配的小时后,它有一个在 df1 中找不到的“0”小时行,df2 数据帧应该看起来像这样,“0 " 删除小时行: 新的 df2: -- 谢谢!

    plant_name  wind_speed_ms  hour  day  month  year
0   BIG HORN I            3.6     1    1      1  2018
1   BIG HORN I            3.1     2    1      1  2018
2   BIG HORN I            3.1     3    1      1  2018
3   BIG HORN I            2.0     4    1      1  2018
4   BIG HORN I            1.6     5    1      1  2018
5   BIG HORN I            0.8     6    1      1  2018
6   BIG HORN I            0.8     7    1      1  2018
7   BIG HORN I            1.0     8    1      1  2018
8   BIG HORN I            0.3     9    1      1  2018
9   BIG HORN I            0.1    10    1      1  2018
10  BIG HORN I            1.1    11    1      1  2018
11  BIG HORN I            1.9    12    1      1  2018
12  BIG HORN I            1.9    13    1      1  2018
13  BIG HORN I            1.0    14    1      1  2018
14  BIG HORN I            0.7    15    1      1  2018
15  BIG HORN I            2.1    16    1      1  2018
16  BIG HORN I            3.5    17    1      1  2018
17  BIG HORN I            2.1    18    1      1  2018
18  BIG HORN I            1.3    19    1      1  2018
19  BIG HORN I            2.3    20    1      1  2018
20  BIG HORN I            2.8    21    1      1  2018
21  BIG HORN I            3.0    22    1      1  2018
22  BIG HORN I            2.5    23    1      1  2018
24  BIG HORN I            3.9     1    2      1  2018
25  BIG HORN I            4.3     2    2      1  2018
26  BIG HORN I            3.5     3    2      1  2018

对所有 date/time 列使用 isin

df2 = df2[df2['hour'].isin(df1['hour']) &
          df2['day'].isin(df1['day']) &
          df2['month'].isin(df1['month']) & 
          df2['year'].isin(df1['year'])]
df2
Out[1]: 
    plant_name  wind_speed_ms  hour  day  month  year
0   BIG HORN I            3.6     1    1      1  2018
1   BIG HORN I            3.1     2    1      1  2018
2   BIG HORN I            3.1     3    1      1  2018
3   BIG HORN I            2.0     4    1      1  2018
4   BIG HORN I            1.6     5    1      1  2018
5   BIG HORN I            0.8     6    1      1  2018
6   BIG HORN I            0.8     7    1      1  2018
7   BIG HORN I            1.0     8    1      1  2018
8   BIG HORN I            0.3     9    1      1  2018
9   BIG HORN I            0.1    10    1      1  2018
10  BIG HORN I            1.1    11    1      1  2018
11  BIG HORN I            1.9    12    1      1  2018
12  BIG HORN I            1.9    13    1      1  2018
13  BIG HORN I            1.0    14    1      1  2018
14  BIG HORN I            0.7    15    1      1  2018
15  BIG HORN I            2.1    16    1      1  2018
16  BIG HORN I            3.5    17    1      1  2018
17  BIG HORN I            2.1    18    1      1  2018
18  BIG HORN I            1.3    19    1      1  2018
19  BIG HORN I            2.3    20    1      1  2018
20  BIG HORN I            2.8    21    1      1  2018
21  BIG HORN I            3.0    22    1      1  2018
22  BIG HORN I            2.5    23    1      1  2018
24  BIG HORN I            3.9     1    2      1  2018  #row with index of 23 removed
25  BIG HORN I            4.3     2    2      1  2018
26  BIG HORN I            3.5     3    2      1  2018