如何匹配除年份以外的所有日期时间索引?
How to match Datetimeindex for all but the year?
我有一个包含缺失值和日期时间索引的数据集。我想用在同一月、日和小时报告的其他值的平均值来填充此值。如果所有年份都没有在此特定 month/day/hour 报告任何值,我想获得报告的最近一小时的内插值平均值。我怎样才能做到这一点?现在我的做法是:
df_Na = df_Na[df_Na['Generation'].isna()]
df_raw = df_raw[~df_raw['Generation'].isna()]
# reduce to month
same_month = df_raw[df_raw.index.month.isin(df_Na.index.month)]
# reduce to same day
same_day = same_month[same_month.index.day.isin(df_Na.index.day)]
# reduce to hour
same_hour = same_day[same_day.index.hour.isin(df_Na.index.hour)]
df_Na 都是我喜欢填充的缺失值,df_raw 都是我喜欢从中获取平均值的报告值。我有一个巨大的数据集,这就是为什么我想不惜一切代价避免 for 循环。
我的数据如下所示:
df_Na
Generation
2017-12-02 19:00:00 NaN
2021-01-12 00:00:00 NaN
2021-01-12 01:00:00 NaN
..............................
2021-02-12 20:00:00 NaN
2021-02-12 21:00:00 NaN
2021-02-12 22:00:00 NaN
df_raw
Generation
2015-09-12 00:00:00 0.0
2015-09-12 01:00:00 19.0
2015-09-12 02:00:00 0.0
..............................
2021-12-11 21:00:00 0.0
2021-12-11 22:00:00 180.0
2021-12-11 23:00:00 0.0
使用GroupBy.transform
with mean
for averages per MM-DD HH
and replace missing values by DataFrame.fillna
:
df = df.fillna(df.groupby(df.index.strftime('%m-%d %H')).transform('mean'))
然后如有必要添加 DataFrame.interpolate
:
df = df.interpolate(method='nearest')
我有一个包含缺失值和日期时间索引的数据集。我想用在同一月、日和小时报告的其他值的平均值来填充此值。如果所有年份都没有在此特定 month/day/hour 报告任何值,我想获得报告的最近一小时的内插值平均值。我怎样才能做到这一点?现在我的做法是:
df_Na = df_Na[df_Na['Generation'].isna()]
df_raw = df_raw[~df_raw['Generation'].isna()]
# reduce to month
same_month = df_raw[df_raw.index.month.isin(df_Na.index.month)]
# reduce to same day
same_day = same_month[same_month.index.day.isin(df_Na.index.day)]
# reduce to hour
same_hour = same_day[same_day.index.hour.isin(df_Na.index.hour)]
df_Na 都是我喜欢填充的缺失值,df_raw 都是我喜欢从中获取平均值的报告值。我有一个巨大的数据集,这就是为什么我想不惜一切代价避免 for 循环。
我的数据如下所示: df_Na
Generation
2017-12-02 19:00:00 NaN
2021-01-12 00:00:00 NaN
2021-01-12 01:00:00 NaN
..............................
2021-02-12 20:00:00 NaN
2021-02-12 21:00:00 NaN
2021-02-12 22:00:00 NaN
df_raw
Generation
2015-09-12 00:00:00 0.0
2015-09-12 01:00:00 19.0
2015-09-12 02:00:00 0.0
..............................
2021-12-11 21:00:00 0.0
2021-12-11 22:00:00 180.0
2021-12-11 23:00:00 0.0
使用GroupBy.transform
with mean
for averages per MM-DD HH
and replace missing values by DataFrame.fillna
:
df = df.fillna(df.groupby(df.index.strftime('%m-%d %H')).transform('mean'))
然后如有必要添加 DataFrame.interpolate
:
df = df.interpolate(method='nearest')