通过 Python 中的插值法填充缺失数据
Filling missing data by interpolation in Python
我有一个 pandas 数据框,如下所示:
Date and Time Seconds Pressure (mmHg) Temperature (C)
0 2021-05-13 13:00:00 0.000 709.719 26.551
1 2021-05-13 14:00:00 3600.001 709.364 25.966
2 2021-05-13 15:00:00 7200.001 708.698 25.331
3 2021-05-13 16:00:00 10800.001 707.689 25.184
4 2021-05-13 17:00:00 14400.001 707.206 25.184
压力和温度数据本应以 15 分钟为间隔,但传感器设置错误,每小时收集一次数据。假设线性插值,如何将数据时间戳扩展到 15 分钟间隔并使用线性插值填充小时之间的缺失数据?
我尝试了 here 建议的解决方案,但我的文件很大而且很多。这个解决方案不是很快。
使用DataFrame.resample
with Resampler.first
for missing values between hours and then DataFrame.interpolate
:
df['Date and Time'] = pd.to_datetime(df['Date and Time'])
df = (df.resample('15Min', on='Date and Time')[['Pressure (mmHg)','Temperature (C)']]
.first()
.interpolate())
print (df)
Pressure (mmHg) Temperature (C)
Date and Time
2021-05-13 13:00:00 709.71900 26.55100
2021-05-13 13:15:00 709.63025 26.40475
2021-05-13 13:30:00 709.54150 26.25850
2021-05-13 13:45:00 709.45275 26.11225
2021-05-13 14:00:00 709.36400 25.96600
2021-05-13 14:15:00 709.19750 25.80725
2021-05-13 14:30:00 709.03100 25.64850
2021-05-13 14:45:00 708.86450 25.48975
2021-05-13 15:00:00 708.69800 25.33100
2021-05-13 15:15:00 708.44575 25.29425
2021-05-13 15:30:00 708.19350 25.25750
2021-05-13 15:45:00 707.94125 25.22075
2021-05-13 16:00:00 707.68900 25.18400
2021-05-13 16:15:00 707.56825 25.18400
2021-05-13 16:30:00 707.44750 25.18400
2021-05-13 16:45:00 707.32675 25.18400
2021-05-13 17:00:00 707.20600 25.18400
我有一个 pandas 数据框,如下所示:
Date and Time Seconds Pressure (mmHg) Temperature (C)
0 2021-05-13 13:00:00 0.000 709.719 26.551
1 2021-05-13 14:00:00 3600.001 709.364 25.966
2 2021-05-13 15:00:00 7200.001 708.698 25.331
3 2021-05-13 16:00:00 10800.001 707.689 25.184
4 2021-05-13 17:00:00 14400.001 707.206 25.184
压力和温度数据本应以 15 分钟为间隔,但传感器设置错误,每小时收集一次数据。假设线性插值,如何将数据时间戳扩展到 15 分钟间隔并使用线性插值填充小时之间的缺失数据? 我尝试了 here 建议的解决方案,但我的文件很大而且很多。这个解决方案不是很快。
使用DataFrame.resample
with Resampler.first
for missing values between hours and then DataFrame.interpolate
:
df['Date and Time'] = pd.to_datetime(df['Date and Time'])
df = (df.resample('15Min', on='Date and Time')[['Pressure (mmHg)','Temperature (C)']]
.first()
.interpolate())
print (df)
Pressure (mmHg) Temperature (C)
Date and Time
2021-05-13 13:00:00 709.71900 26.55100
2021-05-13 13:15:00 709.63025 26.40475
2021-05-13 13:30:00 709.54150 26.25850
2021-05-13 13:45:00 709.45275 26.11225
2021-05-13 14:00:00 709.36400 25.96600
2021-05-13 14:15:00 709.19750 25.80725
2021-05-13 14:30:00 709.03100 25.64850
2021-05-13 14:45:00 708.86450 25.48975
2021-05-13 15:00:00 708.69800 25.33100
2021-05-13 15:15:00 708.44575 25.29425
2021-05-13 15:30:00 708.19350 25.25750
2021-05-13 15:45:00 707.94125 25.22075
2021-05-13 16:00:00 707.68900 25.18400
2021-05-13 16:15:00 707.56825 25.18400
2021-05-13 16:30:00 707.44750 25.18400
2021-05-13 16:45:00 707.32675 25.18400
2021-05-13 17:00:00 707.20600 25.18400