Python - Select 一天中每个小时都有数据的所有行

Python - Select all rows where there is data for each hour in a day

我有一个包含一系列日期的数据框:

Year          Month       Day      Hour

2020          12          3           22

2021          1           1           0

2021          1           1           1

2021          1           1           2

...
2021          1           1           23

2021          1           2           1

2021          1           2           3
...

我想 return 包含一天中所有 24 小时信息的日期的所有行。在上面的示例中,我只想 return 行:

2021          1           1           0

2021          1           1           1

2021          1           1           2

...
2021          1           1           23

我的数据集很长。如有任何帮助,我将不胜感激。谢谢。

import pandas as pd
import random as rd

# generate dummy data
sz = 40000
df = pd.DataFrame()
df['Y'] = [rd.randint(2020, 2021) for _ in range(sz)]
df['M'] = [rd.randint(1, 12) for _ in range(sz)]
df['D'] = [rd.randint(1, 31) for _ in range(sz)]
df['H'] = [rd.randint(0, 23) for _ in range(sz)]

# make an ethalon hour sequence
h24 = [i for i in range(24)]

# group and check if we have 24 hours in the group
# if NaN then no 24 hours here - drop, explode the rest
df = df.groupby(by=['Y', 'M', 'D']).apply(lambda x: None if x.value_counts().size != 24 else h24). \
    dropna(how='any').explode().reset_index().rename(columns={0: "H"})
print(df)

打印:

         Y   M   D   H
0     2020   1   3   0
1     2020   1   3   1
2     2020   1   3   2
3     2020   1   3   3
4     2020   1   3   4
...    ...  ..  ..  ..
1363  2021  12  11  19
1364  2021  12  11  20
1365  2021  12  11  21
1366  2021  12  11  22
1367  2021  12  11  23

[1368 rows x 4 columns]