Python pandas - 用两个 datetime64[ns] 列替换列的 NaN 值
Python pandas - Replace NaN values of column by mean of two datetime64[ns] columns
我在计算 2 datetime64[ns]
列的平均值时遇到问题。
数据框看起来像:
data ={
'time1' :['2019-05-21 08:29:55','2019-10-07 17:43:09','2020-12-13 21:53:00','2018-04-17 16:51:23','2016-08-31 17:40:49'],
'time2':['2019-05-21 09:29:40', '2019-10-07 19:42:50', '2020-12-13 22:44:00', '2018-04-17 17:50:46', '2016-08-31 18:10:49'],
'Avg_time[(time1+time2)/2]':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]
}
df =pd.DataFrame(data)
df
输出:
time1 time2 Avg_time[(time1+time2)/2]
0 2019-05-21 08:29:55 2019-05-21 09:29:40 NaN
1 2019-10-07 17:43:09 2019-10-07 19:42:50 NaN
2 2020-12-13 21:53:00 2020-12-13 22:44:00 NaN
3 2018-04-17 16:51:23 2018-04-17 17:50:46 NaN
4 2016-08-31 17:40:49 2016-08-31 18:10:49 NaN
我希望将 Avg_time[(time1+time2)/2]
列的 NaN
值替换为 time1
和 time2
列的平均值。
注:time1和time2列的类型为datetime64[ns](可使用to_datetime()进行转换)
您可以通过 DataFrame.to_numpy
and casting to np.int64
, then converting to mean
and last back to datetimes and replace missing values by Series.fillna
:
转换将日期时间转换为原始格式 ns
df['time1'] = pd.to_datetime(df['time1'])
df['time2'] = pd.to_datetime(df['time2'])
arr = df[['time1','time2']].to_numpy().astype(np.int64).mean(axis=1)
df['Avg_time'] = df['Avg_time'].fillna(pd.Series(pd.to_datetime(arr), index=df.index))
print (df)
time1 time2 Avg_time
0 2019-05-21 08:29:55 2019-05-21 09:29:40 2019-05-21 08:59:47.500
1 2019-10-07 17:43:09 2019-10-07 19:42:50 2019-10-07 18:42:59.500
2 2020-12-13 21:53:00 2020-12-13 22:44:00 2020-12-13 22:18:30.000
3 2018-04-17 16:51:23 2018-04-17 17:50:46 2018-04-17 17:21:04.500
4 2016-08-31 17:40:49 2016-08-31 18:10:49 2016-08-31 17:55:49.000
选择:
df['time1'] = pd.to_datetime(df['time1'])
df['time2'] = pd.to_datetime(df['time2'])
t1 = df['time1'].to_numpy().astype(np.int64)
t2 = df['time2'].to_numpy().astype(np.int64)
df['Avg_time'] = df['Avg_time'].fillna(pd.Series((t1 + t2) / 2, index=df.index))
我在计算 2 datetime64[ns]
列的平均值时遇到问题。
数据框看起来像:
data ={
'time1' :['2019-05-21 08:29:55','2019-10-07 17:43:09','2020-12-13 21:53:00','2018-04-17 16:51:23','2016-08-31 17:40:49'],
'time2':['2019-05-21 09:29:40', '2019-10-07 19:42:50', '2020-12-13 22:44:00', '2018-04-17 17:50:46', '2016-08-31 18:10:49'],
'Avg_time[(time1+time2)/2]':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]
}
df =pd.DataFrame(data)
df
输出:
time1 time2 Avg_time[(time1+time2)/2]
0 2019-05-21 08:29:55 2019-05-21 09:29:40 NaN
1 2019-10-07 17:43:09 2019-10-07 19:42:50 NaN
2 2020-12-13 21:53:00 2020-12-13 22:44:00 NaN
3 2018-04-17 16:51:23 2018-04-17 17:50:46 NaN
4 2016-08-31 17:40:49 2016-08-31 18:10:49 NaN
我希望将 Avg_time[(time1+time2)/2]
列的 NaN
值替换为 time1
和 time2
列的平均值。
注:time1和time2列的类型为datetime64[ns](可使用to_datetime()进行转换)
您可以通过 DataFrame.to_numpy
and casting to np.int64
, then converting to mean
and last back to datetimes and replace missing values by Series.fillna
:
ns
df['time1'] = pd.to_datetime(df['time1'])
df['time2'] = pd.to_datetime(df['time2'])
arr = df[['time1','time2']].to_numpy().astype(np.int64).mean(axis=1)
df['Avg_time'] = df['Avg_time'].fillna(pd.Series(pd.to_datetime(arr), index=df.index))
print (df)
time1 time2 Avg_time
0 2019-05-21 08:29:55 2019-05-21 09:29:40 2019-05-21 08:59:47.500
1 2019-10-07 17:43:09 2019-10-07 19:42:50 2019-10-07 18:42:59.500
2 2020-12-13 21:53:00 2020-12-13 22:44:00 2020-12-13 22:18:30.000
3 2018-04-17 16:51:23 2018-04-17 17:50:46 2018-04-17 17:21:04.500
4 2016-08-31 17:40:49 2016-08-31 18:10:49 2016-08-31 17:55:49.000
选择:
df['time1'] = pd.to_datetime(df['time1'])
df['time2'] = pd.to_datetime(df['time2'])
t1 = df['time1'].to_numpy().astype(np.int64)
t2 = df['time2'].to_numpy().astype(np.int64)
df['Avg_time'] = df['Avg_time'].fillna(pd.Series((t1 + t2) / 2, index=df.index))