如何匹配除年份以外的所有日期时间索引?

How to match Datetimeindex for all but the year?

我有一个包含缺失值和日期时间索引的数据集。我想用在同一月、日和小时报告的其他值的平均值来填充此值。如果所有年份都没有在此特定 month/day/hour 报告任何值,我想获得报告的最近一小时的内插值平均值。我怎样才能做到这一点?现在我的做法是:

df_Na = df_Na[df_Na['Generation'].isna()]
df_raw = df_raw[~df_raw['Generation'].isna()]
# reduce to month
same_month = df_raw[df_raw.index.month.isin(df_Na.index.month)]
# reduce to same day
same_day = same_month[same_month.index.day.isin(df_Na.index.day)]
# reduce to hour
same_hour = same_day[same_day.index.hour.isin(df_Na.index.hour)]

df_Na 都是我喜欢填充的缺失值,df_raw 都是我喜欢从中获取平均值的报告值。我有一个巨大的数据集,这就是为什么我想不惜一切代价避免 for 循环。

我的数据如下所示: df_Na

                     Generation
2017-12-02 19:00:00         NaN
2021-01-12 00:00:00         NaN
2021-01-12 01:00:00         NaN
..............................
2021-02-12 20:00:00         NaN
2021-02-12 21:00:00         NaN
2021-02-12 22:00:00         NaN

df_raw

                     Generation
2015-09-12 00:00:00         0.0
2015-09-12 01:00:00        19.0
2015-09-12 02:00:00         0.0
..............................
2021-12-11 21:00:00         0.0
2021-12-11 22:00:00       180.0
2021-12-11 23:00:00         0.0

使用GroupBy.transform with mean for averages per MM-DD HH and replace missing values by DataFrame.fillna:

df = df.fillna(df.groupby(df.index.strftime('%m-%d %H')).transform('mean'))

然后如有必要添加 DataFrame.interpolate:

df = df.interpolate(method='nearest')