使用值 python 的平均值填充缺失的时间
filling in the missing times using average for the values pythn
我有这个数据框缺少了一些时间(我希望它是每一分钟)。请看下面的示例:
time = np.array([pd.to_datetime("2022-01-01 00:00:00"),pd.to_datetime("2022-01-01 00:00:01"),pd.to_datetime("2022-01-01 00:00:03"), pd.to_datetime("2022-01-01 00:00:04"),pd.to_datetime("2022-01-01 00:00:07"),pd.to_datetime("2022-01-01 00:00:09"), pd.to_datetime("2022-01-01 00:00:10")])
lat = [58.1, 58.4, 58.5, 58.9, 59,59.2, 59.5]
lng = [1.34, 1.44, 1.46, 1.48, 1.55, 1.57, 1.59]
df = pd.DataFrame({"time": time, "lat": lat, "lng" :lng})
time lat lng
2022-01-01 00:00:00 58.1 1.34
2022-01-01 00:00:01 58.4 1.44
2022-01-01 00:00:03 58.5 1.46
2022-01-01 00:00:04 58.9 1.48
2022-01-01 00:00:07 59.0 1.55
2022-01-01 00:00:09 59.2 1.57
2022-01-01 00:00:10 59.5 1.59
我想及时填补空白,以便每分钟都有数据,并且 lat/lng 填充中间值的平均值。我的计划是为每一分钟创建一个时间数组,并尝试使用 ffil 或类似的东西来填补缺失的点。但我无法弄清楚如何。预期的输出是这样的
time lat lng
2022-01-01 00:00:00 58.1 1.34
2022-01-01 00:00:01 58.4 1.44
2022-01-01 00:00:01 58.45 1.45
2022-01-01 00:00:03 58.5 1.46
2022-01-01 00:00:04 58.9 1.48
2022-01-01 00:00:05 58.933 1.5033
2022-01-01 00:00:06 58.966 1.5233
2022-01-01 00:00:07 59.0 1.55
2022-01-01 00:00:08 59.1 1.56
2022-01-01 00:00:09 59.2 1.57
2022-01-01 00:00:10 59.5 1.59
请给我一些关于如何做到这一点的建议
创建 DatetimeIndex
然后通过 div 添加缺失的时间。DataFrame.asfreq
and interpolate by DataFrame.interpolate
:
df = df.set_index('time').asfreq(freq='S').interpolate()
print (df)
lat lng
time
2022-01-01 00:00:00 58.100000 1.340000
2022-01-01 00:00:01 58.400000 1.440000
2022-01-01 00:00:02 58.450000 1.450000
2022-01-01 00:00:03 58.500000 1.460000
2022-01-01 00:00:04 58.900000 1.480000
2022-01-01 00:00:05 58.933333 1.503333
2022-01-01 00:00:06 58.966667 1.526667
2022-01-01 00:00:07 59.000000 1.550000
2022-01-01 00:00:08 59.100000 1.560000
2022-01-01 00:00:09 59.200000 1.570000
2022-01-01 00:00:10 59.500000 1.590000
我有这个数据框缺少了一些时间(我希望它是每一分钟)。请看下面的示例:
time = np.array([pd.to_datetime("2022-01-01 00:00:00"),pd.to_datetime("2022-01-01 00:00:01"),pd.to_datetime("2022-01-01 00:00:03"), pd.to_datetime("2022-01-01 00:00:04"),pd.to_datetime("2022-01-01 00:00:07"),pd.to_datetime("2022-01-01 00:00:09"), pd.to_datetime("2022-01-01 00:00:10")])
lat = [58.1, 58.4, 58.5, 58.9, 59,59.2, 59.5]
lng = [1.34, 1.44, 1.46, 1.48, 1.55, 1.57, 1.59]
df = pd.DataFrame({"time": time, "lat": lat, "lng" :lng})
time lat lng
2022-01-01 00:00:00 58.1 1.34
2022-01-01 00:00:01 58.4 1.44
2022-01-01 00:00:03 58.5 1.46
2022-01-01 00:00:04 58.9 1.48
2022-01-01 00:00:07 59.0 1.55
2022-01-01 00:00:09 59.2 1.57
2022-01-01 00:00:10 59.5 1.59
我想及时填补空白,以便每分钟都有数据,并且 lat/lng 填充中间值的平均值。我的计划是为每一分钟创建一个时间数组,并尝试使用 ffil 或类似的东西来填补缺失的点。但我无法弄清楚如何。预期的输出是这样的
time lat lng
2022-01-01 00:00:00 58.1 1.34
2022-01-01 00:00:01 58.4 1.44
2022-01-01 00:00:01 58.45 1.45
2022-01-01 00:00:03 58.5 1.46
2022-01-01 00:00:04 58.9 1.48
2022-01-01 00:00:05 58.933 1.5033
2022-01-01 00:00:06 58.966 1.5233
2022-01-01 00:00:07 59.0 1.55
2022-01-01 00:00:08 59.1 1.56
2022-01-01 00:00:09 59.2 1.57
2022-01-01 00:00:10 59.5 1.59
请给我一些关于如何做到这一点的建议
创建 DatetimeIndex
然后通过 div 添加缺失的时间。DataFrame.asfreq
and interpolate by DataFrame.interpolate
:
df = df.set_index('time').asfreq(freq='S').interpolate()
print (df)
lat lng
time
2022-01-01 00:00:00 58.100000 1.340000
2022-01-01 00:00:01 58.400000 1.440000
2022-01-01 00:00:02 58.450000 1.450000
2022-01-01 00:00:03 58.500000 1.460000
2022-01-01 00:00:04 58.900000 1.480000
2022-01-01 00:00:05 58.933333 1.503333
2022-01-01 00:00:06 58.966667 1.526667
2022-01-01 00:00:07 59.000000 1.550000
2022-01-01 00:00:08 59.100000 1.560000
2022-01-01 00:00:09 59.200000 1.570000
2022-01-01 00:00:10 59.500000 1.590000