使用插值将行添加到 pandas 数据框

Add rows to pandas dataframe using interpolate

我正在尝试对包含时间序列数据的 pandas DataFrame 进行插值。我有 temp 的每小时数据,我想在半小时点插入 temp 值。这样,我估计每天每个交易时段的 temp,即。每天 24 小时,即每天 48 个交易时段。

我的 MWE 是

import numpy as np
import pandas as pd
from datetime import datetime, date, timedelta
import pyarrow as pa
import pyarrow.parquet as pq

# my dataset
df = pd.DataFrame()
d1 = '2020-10-21'
d2 = '2020-10-22'
df['date'] = pd.to_datetime([d1]*24+[d2]*24, format='%Y-%m-%d')
df['time'] = pd.date_range(d1, periods=len(df), freq='H').time
df['temp'] = pd.DataFrame((50+20*np.sin(np.linspace(0,0.91*np.pi,len(df))))).values

# combine time and date
df.loc[:,'datetime'] = pd.to_datetime(df.date.astype(str)+' '+df.time.astype(str))
df = df.drop(['date','time'], axis=1)
df = df.set_index('datetime')

# trading period
df['tp'] = pd.DataFrame(df.index.hour.values*2+1).values

# interpolate to find temp and datetime for trading periods 2,4,6,...
for n in df.tp.values:
    df.loc[-1,'tp'] = n+1
    df = df.sort_values('tp').reset_index(drop=True)

#df = df.interpolate(method='linear')

print(df.head(10))

我正在调整 post 中的答案,但出现错误 TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead. 我怀疑这是由于 df.loc[-1,'tp'] = n+1 行造成的,但不确定如何修复它。

尝试:

df = df.resample('30T').mean().interpolate()
df['tp'] = ((df.index.hour * 60 + df.index.minute) / 30 + 1).astype(int)

尝试 asfreq 然后 interpolate:

In [36]: df.asfreq('30T').interpolate()
Out[36]:
                          temp    tp
datetime
2020-10-21 00:00:00  50.000000   1.0
2020-10-21 00:30:00  50.607891   2.0
2020-10-21 01:00:00  51.215782   3.0
2020-10-21 01:30:00  51.821424   4.0
2020-10-21 02:00:00  52.427066   5.0
...                        ...   ...
2020-10-22 21:00:00  57.869280  43.0
2020-10-22 21:30:00  57.303145  44.0
2020-10-22 22:00:00  56.737010  45.0
2020-10-22 22:30:00  56.158416  46.0
2020-10-22 23:00:00  55.579822  47.0

[95 rows x 2 columns]