Pandas 删除重复项并将值替换为重复项的 nanmean

Pandas drop duplicates and replace the value by the nanmean of the duplicates

我有一个通过附加 4 个数据帧创建的数据帧(索引 = 日期)。因此,我的索引中有重复项,通常同一天有 3 个 NaN 和 1 个值。 我的目标是将此数据帧上采样到每日频率 (df = df.resample('1D)),但在此之前我必须删除重复项。

我想去掉重复的时间,但是要满足两个条件:

我猜想使用 np.nanmean() 会涵盖这两个条件(returns 没有值时为 NaN,否则为值的平均值)。

例如:

df = pd.DataFrame({'Pt0': [nan, -42.0, nan, nan, -26.0, nan, nan, nan, 0.0, -10.0]}, 
             index=['1984-06-10 00:00:00.096000064', '1984-06-10 00:00:00.096000064',
                    '1984-07-20 00:00:00.176000000', '1984-07-20 00:00:00.176000000',
                    '1984-07-28 00:00:00.192000000', '1984-07-28 00:00:00.192000000',
                    '1984-09-06 00:00:00.080000000', '1984-09-06 00:00:00.080000000',
                    '1984-09-06 00:00:00.271999936', '1984-09-06 00:00:00.271999936'])
df = 
                                Pt0
1984-06-10 00:00:00.096000064   NaN
1984-06-10 00:00:00.096000064 -42.0
1984-07-20 00:00:00.176000000   NaN
1984-07-20 00:00:00.176000000   NaN
1984-07-28 00:00:00.192000000 -26.0
1984-07-28 00:00:00.192000000   NaN
1984-09-06 00:00:00.080000000   NaN
1984-09-06 00:00:00.080000000   NaN
1984-09-06 00:00:00.271999936   0
1984-09-06 00:00:00.271999936   -10

df_dropped = 
                               Pt0
1984-06-10 00:00:00.096000064 -42.0
1984-07-20 00:00:00.176000000   NaN
1984-07-28 00:00:00.192000000 -26.0
1984-09-06 00:00:00.080000000 -5.0

我尝试使用 df = df.groupby('Pt0').mean().reset_index() 但它最终跳过了 NaN,我想如果 df.groupby() 有一个 nanmean() 函数它会起作用。

我该怎么做?

首先,将索引转换为日期时间对象。然后你可以 groupby 索引和转换 np.nanmean;然后 drop_duplicates:

df.index = pd.to_datetime(df.index)
out = df.groupby(level=0)['Pt0'].transform(np.nanmean).drop_duplicates().to_frame()

输出:

                                  Pt0
0 1984-06-10 00:00:00.096000064 -42.0
1 1984-07-20 00:00:00.176000000   NaN
2 1984-07-28 00:00:00.192000000 -26.0
3 1984-09-06 00:00:00.271999936  -5.0