Python pandas - 用两个 datetime64[ns] 列替换列的 NaN 值

Python pandas - Replace NaN values of column by mean of two datetime64[ns] columns

我在计算 2 datetime64[ns] 列的平均值时遇到问题。

数据框看起来像:

data ={
        'time1' :['2019-05-21 08:29:55','2019-10-07 17:43:09','2020-12-13 21:53:00','2018-04-17 16:51:23','2016-08-31 17:40:49'],
        'time2':['2019-05-21 09:29:40', '2019-10-07 19:42:50', '2020-12-13 22:44:00', '2018-04-17 17:50:46', '2016-08-31 18:10:49'],
        'Avg_time[(time1+time2)/2]':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]
      }
df =pd.DataFrame(data)
df

输出:

          time1                time2            Avg_time[(time1+time2)/2]
0   2019-05-21 08:29:55  2019-05-21 09:29:40         NaN
1   2019-10-07 17:43:09  2019-10-07 19:42:50         NaN
2   2020-12-13 21:53:00  2020-12-13 22:44:00         NaN
3   2018-04-17 16:51:23  2018-04-17 17:50:46         NaN
4   2016-08-31 17:40:49  2016-08-31 18:10:49         NaN

我希望将 Avg_time[(time1+time2)/2] 列的 NaN 值替换为 time1time2 列的平均值。

注:time1和time2列的类型为datetime64[ns](可使用to_datetime()进行转换)

您可以通过 DataFrame.to_numpy and casting to np.int64, then converting to mean and last back to datetimes and replace missing values by Series.fillna:

转换将日期时间转换为原始格式 ns
df['time1'] = pd.to_datetime(df['time1'])
df['time2'] = pd.to_datetime(df['time2'])

arr = df[['time1','time2']].to_numpy().astype(np.int64).mean(axis=1)
df['Avg_time'] = df['Avg_time'].fillna(pd.Series(pd.to_datetime(arr), index=df.index))
print (df)
                time1               time2                Avg_time
0 2019-05-21 08:29:55 2019-05-21 09:29:40 2019-05-21 08:59:47.500
1 2019-10-07 17:43:09 2019-10-07 19:42:50 2019-10-07 18:42:59.500
2 2020-12-13 21:53:00 2020-12-13 22:44:00 2020-12-13 22:18:30.000
3 2018-04-17 16:51:23 2018-04-17 17:50:46 2018-04-17 17:21:04.500
4 2016-08-31 17:40:49 2016-08-31 18:10:49 2016-08-31 17:55:49.000

选择:

df['time1'] = pd.to_datetime(df['time1'])
df['time2'] = pd.to_datetime(df['time2'])

t1 = df['time1'].to_numpy().astype(np.int64)
t2 = df['time2'].to_numpy().astype(np.int64)
df['Avg_time'] = df['Avg_time'].fillna(pd.Series((t1 + t2) / 2, index=df.index))