如何重新采样和插值(三次样条)时间序列数据
How to resample and interpolate (cubic spline) timeseries data
我需要将时间序列重新采样到固定间隔,例如。 3个月,同时用三次样条法插值。
什么是最有效的方法?
示例数据:
dates = ('2020-09-24','2020-10-19','2020-12-17','2021-03-17','2021-06-17','2021-09-17','2022-03-17','2022-09-20','2023-09-19','2024-09-17','2025-09-17','2026-09-17','2027-09-17','2028-09-19','2029-09-18','2030-09-17','2031-09-17','2032-09-17','2035-09-18','2040-09-18','2045-09-19')
factors = ('1','0.999994','0.999875','1.000166','1.000303','1.000438','1.00056','1.000817','1.001046','1.001412','1.001525','1.001334','1.000685','0.999376','0.997456','0.994626','0.991244','0.986754','0.982072','0.962028','0.925136')
df = pd.DataFrame()
df['dates']=dates
df['factors']=factors
试试这个:
import pandas as pd
from datetime import timedelta
dates = ('2020-09-24','2020-10-19','2020-12-17','2021-03-17','2021-06-17','2021-09-17','2022-03-17','2022-09-20','2023-09-19','2
024-09-17','2025-09-17','2026-09-17','2027-09-17','2028-09-19','2029-09-18','2030-09-17','2031-09-17','2032-09-17','2035-09-18',
'2040-09-18','2045-09-19')
factors = ('1','0.999994','0.999875','1.000166','1.000303','1.000438','1.00056','1.000817','1.001046','1.001412','1.001525','1.0
01334','1.000685','0.999376','0.997456','0.994626','0.991244','0.986754','0.982072','0.962028','0.925136')
df = pd.DataFrame()
df['dates']=dates
df['factors']=factors
df = pd.DataFrame()
df['dates']=dates
df['factors']=factors
df['dates'] = pd.to_datetime(df['dates'])
df.set_index(['dates'],inplace=True)
df['factors'] = df['factors'].astype(float)
df = df.resample('3MS', loffset=timedelta(days=df.index[0].day - 1 )).mean().interpolate(method='cubic')
print(df)
输出:
dates
2020-09-24 0.999997
2020-12-24 0.999875
2021-03-24 1.000166
2021-06-24 1.000303
2021-09-24 1.000438
... ...
2044-09-24 0.933154
2044-12-24 0.931170
2045-03-24 0.929196
2045-06-24 0.927170
2045-09-24 0.925136
我需要将时间序列重新采样到固定间隔,例如。 3个月,同时用三次样条法插值。 什么是最有效的方法? 示例数据:
dates = ('2020-09-24','2020-10-19','2020-12-17','2021-03-17','2021-06-17','2021-09-17','2022-03-17','2022-09-20','2023-09-19','2024-09-17','2025-09-17','2026-09-17','2027-09-17','2028-09-19','2029-09-18','2030-09-17','2031-09-17','2032-09-17','2035-09-18','2040-09-18','2045-09-19')
factors = ('1','0.999994','0.999875','1.000166','1.000303','1.000438','1.00056','1.000817','1.001046','1.001412','1.001525','1.001334','1.000685','0.999376','0.997456','0.994626','0.991244','0.986754','0.982072','0.962028','0.925136')
df = pd.DataFrame()
df['dates']=dates
df['factors']=factors
试试这个:
import pandas as pd
from datetime import timedelta
dates = ('2020-09-24','2020-10-19','2020-12-17','2021-03-17','2021-06-17','2021-09-17','2022-03-17','2022-09-20','2023-09-19','2
024-09-17','2025-09-17','2026-09-17','2027-09-17','2028-09-19','2029-09-18','2030-09-17','2031-09-17','2032-09-17','2035-09-18',
'2040-09-18','2045-09-19')
factors = ('1','0.999994','0.999875','1.000166','1.000303','1.000438','1.00056','1.000817','1.001046','1.001412','1.001525','1.0
01334','1.000685','0.999376','0.997456','0.994626','0.991244','0.986754','0.982072','0.962028','0.925136')
df = pd.DataFrame()
df['dates']=dates
df['factors']=factors
df = pd.DataFrame()
df['dates']=dates
df['factors']=factors
df['dates'] = pd.to_datetime(df['dates'])
df.set_index(['dates'],inplace=True)
df['factors'] = df['factors'].astype(float)
df = df.resample('3MS', loffset=timedelta(days=df.index[0].day - 1 )).mean().interpolate(method='cubic')
print(df)
输出:
dates
2020-09-24 0.999997
2020-12-24 0.999875
2021-03-24 1.000166
2021-06-24 1.000303
2021-09-24 1.000438
... ...
2044-09-24 0.933154
2044-12-24 0.931170
2045-03-24 0.929196
2045-06-24 0.927170
2045-09-24 0.925136